Using Solarium with SOLR for Search – Advanced

This entry is part 4 of 4 in the series Using Solarium for SOLR Search

Using Solarium for SOLR Search

This is the fourth and final part of a series on using Apache’s SOLR search implementation along with Solarium, a PHP library to integrate it into your application as if it were native.

In the first three parts we installed and configured SOLR and Solarium and started building an example application for searching movies. We’ve also looked at faceted search.

We’re going to wrap up the series by looking at some more advanced features of SOLR, and how to use them with Solarium.

Highlighting Results with SOLR

The Highlighting component allows you to highlight the parts of a document which have matched your search. Its behavior around what gets shown depends on the field – if it’s a title chances are it’ll show it in its entirety with the matched words present, and longer fields – such as a synopsis or the body of an article – it will highlight the words but using snippets; much like Google’s search results do.

To set up highlighting, you first need to specify the fields to include. Then, you can set a prefix and corresponding postfix for the highlighted words or phrases. So for example, to make highlighted words and phrases bold:

$hl = $query->getHighlighting();
$hl->setFields(array('title', 'synopsis'));
$hl->setSimplePrefix('<strong>');
$hl->setSimplePostfix('</strong>');

Alternatively, to add a background color:

$hl = $query->getHighlighting();
$hl->setFields(array('title', 'synopsis'));
$hl->setSimplePrefix('<span style="background:yellow;">');
$hl->setSimplePostfix('</span>');

Or you can even use per-field settings:

$hl = $query->getHighlighting();
$hl->getField('title')->setSimplePrefix('<strong>')->setSimplePostfix('</strong>');
$hl->getField('synopsis')->setSimplePrefix('<span style="background:yellow;">')->setSimplePostfix('</span>');

Once you’ve configured the highlighting component in your search implementation, there’s a little more work to do involved in displaying it in your search results view.

First, you need to extract the highlighted document from the highlighting component by ID:

$highlightedDoc = $highlighting->getResult($document->id);

Now, you can access all the highlighted fields by iterating through them, as properties of the highlighted document:

if($highlightedDoc){
    foreach($highlightedDoc as $field => $highlight) {
        echo implode(' (...) ', $highlight) . '<br/>';
    }
}

Or, you can use getField():

if($highlightedDoc){
    $highlightedTitle = $highlightedDoc->getField('title');
}

Highlighted fields don’t simply return text, however Instead, they’ll return an array of “snippets” of text. If there are no matches for that particular field – for example if your search matched on title but not synopsis – then that array will be empty.

The code above will return a maximum of one snippet. To change this behavior, you can use the setSnippets() method:

$hl = $query->getHighlighting();
$hl->setSnippets(5);
// . . . as before . . .

For example, suppose you search for the word “star”. One of the results has a synopsis that reads as follows:

This not to be missed movie theater event will feature one of the most memorable moments in TV history and exclusive clips about the making of The Best of Both Worlds and Star Trek: The Next Generation Season 3. Set in the 24th century, The Next Generation was created by Gene Roddenberry over 20 years after the original Star Trek series. The Next Generation became the longest running series of the Star Trek franchise, consisting of 178 episodes over 7 seasons. Star Trek: The Next Generation – The Best of Both Worlds is the first opportunity to see The Best of Both Worlds, one of the greatest TV episodes of all time, as a gloriously remastered full-length feature in select movie theaters nationwide.

The highlighted document’s synopsis array will contain three items:

  • history and exclusive clips about the making of The Best of Both Worlds and Star Trek: The Next Generation
  • after the original Star Trek series. The Next Generation became the longest running series of the Star
  • Trek franchise, consisting of 178 episodes over 7 seasons. Star Trek: The Next Generation – The Best of

One way to display multiple snippets is to implode them, for example:

implode(' ... ', $highlightedDoc->getField('synopsis'))

This results in the following:

history and exclusive clips about the making of The Best of Both Worlds and Star Trek: The Next Generation … after the original Star Trek series. The Next Generation became the longest running series of the Star … Trek franchise, consisting of 178 episodes over 7 seasons. Star Trek: The Next Generation – The Best of

There are a number of other parameters you can use to modify the behavior of the highlighting component, which are explained here.

Now that we’ve covered how to use highlighting, integrating it into our movie search application should be straightforward.

The first thing to do is modify app/controllers/HomeController.php by adding the following, just before we run the search:

// Get highlighting component, and apply settings
$hl = $query->getHighlighting();
$hl->setSnippets(5);
$hl->setFields(array('title', 'synopsis'));

$hl->setSimplePrefix('<span style="background:yellow;">');
$hl->setSimplePostfix('</span>');

// Execute the query and return the result
$resultset = $this->client->select($query);

Then the search results – which you’ll remember are in app/views/home/index.blade.php – become:

@if (isset($resultset))    
<header>
    <p>Your search yielded <strong>{{ $resultset->getNumFound() }}</strong> results:</p>
    <hr />
</header>

@foreach ($resultset as $document)

    <?php $highlightedDoc = $highlighting->getResult($document->id); ?>

    <h3>{{ (count($highlightedDoc->getField('title'))) ? implode(' ... ', $highlightedDoc->getField('title')) : $document->title }}</h3>

    <dl>
        <dt>Year</dt>
        <dd>{{ $document->year }}</dd>

        @if (is_array($document->cast))
        <dt>Cast</dt>
        <dd>{{ implode(', ', $document->cast) }}</dd>              
        @endif

    </dl>

    {{ (count($highlightedDoc->getField('synopsis'))) ? implode(' ... ', $highlightedDoc->getField('synopsis')) : $document->synopsis }}

@endforeach
@endif

Notice how each search result essentially mixes and matches fields between the search result document, and the highlighted document – the latter is effectively a subset of the former. Depending on your schema, you may have all your fields available in the highlighted version.

Suggester – Adding Autocomplete

The Suggester component is used to suggest query terms based on incomplete query input. Essentially it examines the index on a given field and extracts search terms which match a certain pattern. You can then order those suggestions by frequency to increase the relevance of the search.

To set up the suggester, we need to configure it in your solrconfig.xml file. Open it up place the following snippet of XML somewhere near the other <searchComponent> declarations:

<searchComponent class="solr.SpellCheckComponent" name="suggest">
    <lst name="spellchecker">
        <str name="name">suggest</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory</str>
        <str name="field">title</str>  <!-- the indexed field to derive suggestions from -->
        <float name="threshold">0.005</float>
        <str name="buildOnCommit">true</str>
    </lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
    <lst name="defaults">
        <str name="spellcheck">true</str>
        <str name="spellcheck.dictionary">suggest</str>
        <str name="spellcheck.onlyMorePopular">true</str>
        <str name="spellcheck.count">5</str>
        <str name="spellcheck.collate">true</str>
    </lst>
    <arr name="components">
        <str>suggest</str>
    </arr>
</requestHandler>

You’ll notice a number of references to “spellcheck”, but this is simply because the Suggester component reuses much of that functionality internally.

The important bit to notice is the <str name="field"> item, which tells the component that we want to use the title field on which to base our suggestions.

Restart SOLR, and you can now try running a suggest query through your web browser:

`http://localhost:8983/solr/suggest?q=ho`

(You may need to alter the port number, depending on how you set up SOLR)

The output should look a little like this:

<?xml version="1.0" encoding="UTF-8"?>
<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">0</int>
    </lst>
    <lst name="spellcheck">
        <lst name="suggestions">
            <lst name="ho">
                <int name="numFound">4</int>
                <int name="startOffset">0</int>
                <int name="endOffset">2</int>
                <arr name="suggestion">
                    <str>house</str>
                    <str>houses</str>
                    <str>horror</str>
                    <str>home</str>
                </arr>
            </lst>
            <str name="collation">house</str>
        </lst>
    </lst>
</response>

As you can see, SOLR has returned four possible matches for “ho” – *ho**use, **ho**uses, **ho**rror and **ho**me. Despite *home and horror being before house in the alphabet, house appears first by virtue of being one of the most common search terms in our index.

Let’s use this component to create an autocomplete for our search box, which will suggest common search terms as the user types their query.

First, define the route:

public function getAutocomplete()
{
    // get a suggester query instance
    $query = $client->createSuggester();
    $query->setQuery(Input::get('term'));
    $query->setDictionary('suggest');
    $query->setOnlyMorePopular(true);
    $query->setCount(10);
    $query->setCollate(true);

    // this executes the query and returns the result
    $resultset = $client->suggester($query);

    $suggestions = array();

    foreach ($resultset as $term => $termResult) {
        foreach ($termResult as $result) {
            $suggestions[] = $result;
        }
    }

    return Response::json($suggestions);
}

Include JQuery UI (and JQuery itself) in your layout:

<script src="//code.jquery.com/jquery-1.11.0.min.js"></script>
<script src="//code.jquery.com/ui/1.10.4/jquery-ui.min.js"></script>

Include a JQuery UI theme:

<link rel="stylesheet" type="text/css" href="//code.jquery.com/ui/1.10.4/themes/redmond/jquery-ui.css"> 

And finally, add some JS to initialize the autocomplete:

$(function () {
    $('input[name="q"]').autocomplete({
        source: '/autocomplete',
        minLength: 2
    });
});

That’s all there is to it – try it out by running a few searches.

Array-based Configuration

If you prefer, you can use an array to set up your query – for example:

$select = array(
  'query'         => Input::get('q'),
  'query_fields'  => array('title', 'cast', 'synopsis'),
  'start'         => 0,
  'rows'          => 100,
  'fields'        => array('*', 'id', 'title', 'synopsis', 'cast', 'score'),      
  'sort'          => array('year' => 'asc'),      
  'filterquery' => array(
      'maxprice' => array(
          'year' => 'year:[1990 TO 1990]'
      ),
  ),    
  'component' => array(
    'facetset' => array(
      'facet' => array(        
        array('type' => 'field', 'key' => 'rating', 'field' => 'rating'),
      )
    ),
  ),
);

$query = $this->client->createSelect($select);

Adding Additional Cores

At startup, SOLR traverses the specified home directory looking for cores, which it identifies when it locates a file called core.propeties. So far we’ve used a core called collection1, and you’ll see that it contains three key items:

The core.propertes file. At its most basic, it simply contains the name of the instance.

The conf directory contains the configuration files for the instance. As a minimum, this directory must contain a schema.xml and an solrconfig.xml file.

The data directory holds the indexes. The location of this directory can be overridden, and if it doesn’t exist it’ll be created for you.

So, to create a new instance follow these steps:

  1. Create a new directory in your home directory – movies in the example application
  2. Create a conf directory in that
  3. Create or copy a schema.xml file and solrconfig.xml file in the conf directory, and customize accordingly
  4. Create a text file called core.properties in the home directory, with the following contents:

name=instancename

…where instancename is the name of your new directory.

Note that the schema.xml configuration that ships in the examples directory contains references to a number of text files – for example stopwords.txt, protwords.txt etc – which you may need to copy over as well.

Then restart SOLR.

You can also add a new core via the administrative web interface in your web browser – click Core Admin on the left hand side, then Add Core.

Additional Configuration

There are a few additional configuration files worth a mention.

The stopwords.txt file – or more specifically, the language-specific files such as lang/stopwords_en.txt – contain words which should be ignored by the search indexer, such as “a”, “the” and “at”. In most cases, you probably won’t need to modify this file.

Depending on your application, you may find that you need to add words to protwords.txt. This file contains a list of protected words that aren’t “stemmed” – that is, reduced to their basic form; for example “asked” becomes “ask”, “working” becomes “work”. Sometimes stemming attempts to “correct” words, perhaps removing what it thinks are erroneous letters of numbers at the end. You might be dealing with geographical areas and find that “Maine” is stemmed to “maine”.

You can specify synonyms – words with the same meaning – in synonyms.txt. Separate synonyms with commas on a per-line basis. For example:

GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television, Televisions, TV, TVs

You may also use synoyms.txt to help correct common spelling mistakes using synonym mappings, for example:

assassination => assasination
environment => enviroment

If you’re using currency fields, you may wish to update and keep an eye on currency.xml, which specifies some example exchange rates – which of course are highly volatile.

Summary

In this series we’ve looked at Apache’s SOLR implementation for search, and used the PHP Solarium library to interact with it. We’ve installed and configured SOLR along with an example schema, and built an application designed to search a set of movies, which demonstrates a number of features of SOLR. We’ve looked at faceted search, highlighting results and the DisMax component. Hopefully this will give you enough of a grounding to adapt it to use SOLR for search in your applications.

For further reading, you may wish to download the SOLR reference guide as a PDF, or consult the Solarium documentation.

Using Solarium for SOLR Search

<< Using Solarium with SOLR for Search – Implementation

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

No Reader comments

Comments on this post are closed.