Using Solarium with SOLR for Search - Advanced

This is the fourth and final part of a series on using Apache’s SOLR search implementation along with Solarium, a PHP library to integrate it into your application as if it were native.

In the first three parts we installed and configured SOLR and Solarium and started building an example application for searching movies. We’ve also looked at faceted search.

We’re going to wrap up the series by looking at some more advanced features of SOLR, and how to use them with Solarium.

Highlighting Results with SOLR

The Highlighting component allows you to highlight the parts of a document which have matched your search. Its behavior around what gets shown depends on the field – if it’s a title chances are it’ll show it in its entirety with the matched words present, and longer fields – such as a synopsis or the body of an article – it will highlight the words but using snippets; much like Google’s search results do.

To set up highlighting, you first need to specify the fields to include. Then, you can set a prefix and corresponding postfix for the highlighted words or phrases. So for example, to make highlighted words and phrases bold:

$hl = $query->getHighlighting();
$hl->setFields(array('title', 'synopsis'));
$hl->setSimplePrefix('<strong>');
$hl->setSimplePostfix('</strong>');

Alternatively, to add a background color:

$hl = $query->getHighlighting();
$hl->setFields(array('title', 'synopsis'));
$hl->setSimplePrefix('<span style="background:yellow;">');
$hl->setSimplePostfix('</span>');

Or you can even use per-field settings:

$hl = $query->getHighlighting();
$hl->getField('title')->setSimplePrefix('<strong>')->setSimplePostfix('</strong>');
$hl->getField('synopsis')->setSimplePrefix('<span style="background:yellow;">')->setSimplePostfix('</span>');

Once you’ve configured the highlighting component in your search implementation, there’s a little more work to do involved in displaying it in your search results view.

First, you need to extract the highlighted document from the highlighting component by ID:

$highlightedDoc = $highlighting->getResult($document->id);

Now, you can access all the highlighted fields by iterating through them, as properties of the highlighted document:

if($highlightedDoc){
    foreach($highlightedDoc as $field => $highlight) {
        echo implode(' (...) ', $highlight) . '<br/>';
    }
}

Or, you can use getField():

if($highlightedDoc){
    $highlightedTitle = $highlightedDoc->getField('title');
}

Highlighted fields don’t simply return text, however Instead, they’ll return an array of “snippets” of text. If there are no matches for that particular field – for example if your search matched on title but not synopsis – then that array will be empty.

The code above will return a maximum of one snippet. To change this behavior, you can use the setSnippets() method:

$hl = $query->getHighlighting();
$hl->setSnippets(5);
// . . . as before . . .

For example, suppose you search for the word “star”. One of the results has a synopsis that reads as follows:

This not to be missed movie theater event will feature one of the most memorable moments in TV history and exclusive clips about the making of The Best of Both Worlds and Star Trek: The Next Generation Season 3. Set in the 24th century, The Next Generation was created by Gene Roddenberry over 20 years after the original Star Trek series. The Next Generation became the longest running series of the Star Trek franchise, consisting of 178 episodes over 7 seasons. Star Trek: The Next Generation – The Best of Both Worlds is the first opportunity to see The Best of Both Worlds, one of the greatest TV episodes of all time, as a gloriously remastered full-length feature in select movie theaters nationwide.

The highlighted document’s synopsis array will contain three items:

history and exclusive clips about the making of The Best of Both Worlds and Star Trek: The Next Generation
after the original Star Trek series. The Next Generation became the longest running series of the Star
Trek franchise, consisting of 178 episodes over 7 seasons. Star Trek: The Next Generation – The Best of

One way to display multiple snippets is to implode them, for example:

implode(' ... ', $highlightedDoc->getField('synopsis'))

This results in the following:

history and exclusive clips about the making of The Best of Both Worlds and Star Trek: The Next Generation … after the original Star Trek series. The Next Generation became the longest running series of the Star … Trek franchise, consisting of 178 episodes over 7 seasons. Star Trek: The Next Generation – The Best of

There are a number of other parameters you can use to modify the behavior of the highlighting component, which are explained here.

Integrating Highlighting into Our Movie Search

Now that we’ve covered how to use highlighting, integrating it into our movie search application should be straightforward.

The first thing to do is modify app/controllers/HomeController.php by adding the following, just before we run the search:

// Get highlighting component, and apply settings
$hl = $query->getHighlighting();
$hl->setSnippets(5);
$hl->setFields(array('title', 'synopsis'));

$hl->setSimplePrefix('<span style="background:yellow;">');
$hl->setSimplePostfix('</span>');

// Execute the query and return the result
$resultset = $this->client->select($query);

Then the search results – which you’ll remember are in app/views/home/index.blade.php – become:

@if (isset($resultset))    
<header>
    <p>Your search yielded <strong>{{ $resultset->getNumFound() }}</strong> results:</p>
    <hr />
</header>

@foreach ($resultset as $document)

    <?php $highlightedDoc = $highlighting->getResult($document->id); ?>

    <h3>{{ (count($highlightedDoc->getField('title'))) ? implode(' ... ', $highlightedDoc->getField('title')) : $document->title }}</h3>

    <dl>
        <dt>Year</dt>
        <dd>{{ $document->year }}</dd>

        @if (is_array($document->cast))
        <dt>Cast</dt>
        <dd>{{ implode(', ', $document->cast) }}</dd>              
        @endif

    </dl>

    {{ (count($highlightedDoc->getField('synopsis'))) ? implode(' ... ', $highlightedDoc->getField('synopsis')) : $document->synopsis }}

@endforeach
@endif

Notice how each search result essentially mixes and matches fields between the search result document, and the highlighted document – the latter is effectively a subset of the former. Depending on your schema, you may have all your fields available in the highlighted version.

Suggester – Adding Autocomplete

The Suggester component is used to suggest query terms based on incomplete query input. Essentially it examines the index on a given field and extracts search terms which match a certain pattern. You can then order those suggestions by frequency to increase the relevance of the search.

To set up the suggester, we need to configure it in your solrconfig.xml file. Open it up place the following snippet of XML somewhere near the other <searchComponent> declarations:

<searchComponent class="solr.SpellCheckComponent" name="suggest">
    <lst name="spellchecker">
        <str name="name">suggest</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory</str>
        <str name="field">title</str>  <!-- the indexed field to derive suggestions from -->
        <float name="threshold">0.005</float>
        <str name="buildOnCommit">true</str>
    </lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
    <lst name="defaults">
        <str name="spellcheck">true</str>
        <str name="spellcheck.dictionary">suggest</str>
        <str name="spellcheck.onlyMorePopular">true</str>
        <str name="spellcheck.count">5</str>
        <str name="spellcheck.collate">true</str>
    </lst>
    <arr name="components">
        <str>suggest</str>
    </arr>
</requestHandler>

You’ll notice a number of references to “spellcheck”, but this is simply because the Suggester component reuses much of that functionality internally.

The important bit to notice is the <str name="field"> item, which tells the component that we want to use the title field on which to base our suggestions.

Restart SOLR, and you can now try running a suggest query through your web browser:

`http://localhost:8983/solr/suggest?q=ho`

(You may need to alter the port number, depending on how you set up SOLR)

The output should look a little like this:

<?xml version="1.0" encoding="UTF-8"?>
<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">0</int>
    </lst>
    <lst name="spellcheck">
        <lst name="suggestions">
            <lst name="ho">
                <int name="numFound">4</int>
                <int name="startOffset">0</int>
                <int name="endOffset">2</int>
                <arr name="suggestion">
                    <str>house</str>
                    <str>houses</str>
                    <str>horror</str>
                    <str>home</str>
                </arr>
            </lst>
            <str name="collation">house</str>
        </lst>
    </lst>
</response>

As you can see, SOLR has returned four possible matches for “ho” – *ho**use, **ho**uses, **ho**rror and **ho**me. Despite *home and horror being before house in the alphabet, house appears first by virtue of being one of the most common search terms in our index.

Let’s use this component to create an autocomplete for our search box, which will suggest common search terms as the user types their query.

First, define the route:

public function getAutocomplete()
{
    // get a suggester query instance
    $query = $client->createSuggester();
    $query->setQuery(Input::get('term'));
    $query->setDictionary('suggest');
    $query->setOnlyMorePopular(true);
    $query->setCount(10);
    $query->setCollate(true);

    // this executes the query and returns the result
    $resultset = $client->suggester($query);

    $suggestions = array();

    foreach ($resultset as $term => $termResult) {
        foreach ($termResult as $result) {
            $suggestions[] = $result;
        }
    }

    return Response::json($suggestions);
}

Include JQuery UI (and JQuery itself) in your layout:

<script src="//code.jquery.com/jquery-1.11.0.min.js"></script>
<script src="//code.jquery.com/ui/1.10.4/jquery-ui.min.js"></script>

Include a JQuery UI theme:

<link rel="stylesheet" type="text/css" href="//code.jquery.com/ui/1.10.4/themes/redmond/jquery-ui.css">

And finally, add some JS to initialize the autocomplete:

$(function () {
    $('input[name="q"]').autocomplete({
        source: '/autocomplete',
        minLength: 2
    });
});

That’s all there is to it – try it out by running a few searches.

Array-based Configuration

If you prefer, you can use an array to set up your query – for example:

$select = array(
  'query'         => Input::get('q'),
  'query_fields'  => array('title', 'cast', 'synopsis'),
  'start'         => 0,
  'rows'          => 100,
  'fields'        => array('*', 'id', 'title', 'synopsis', 'cast', 'score'),      
  'sort'          => array('year' => 'asc'),      
  'filterquery' => array(
      'maxprice' => array(
          'year' => 'year:[1990 TO 1990]'
      ),
  ),    
  'component' => array(
    'facetset' => array(
      'facet' => array(        
        array('type' => 'field', 'key' => 'rating', 'field' => 'rating'),
      )
    ),
  ),
);

$query = $this->client->createSelect($select);

Adding Additional Cores

At startup, SOLR traverses the specified home directory looking for cores, which it identifies when it locates a file called core.propeties. So far we’ve used a core called collection1, and you’ll see that it contains three key items:

The core.propertes file. At its most basic, it simply contains the name of the instance.

The conf directory contains the configuration files for the instance. As a minimum, this directory must contain a schema.xml and an solrconfig.xml file.

The data directory holds the indexes. The location of this directory can be overridden, and if it doesn’t exist it’ll be created for you.

So, to create a new instance follow these steps:

Create a new directory in your home directory – movies in the example application
Create a conf directory in that
Create or copy a schema.xml file and solrconfig.xml file in the conf directory, and customize accordingly
Create a text file called core.properties in the home directory, with the following contents:

name=instancename

…where instancename is the name of your new directory.

Note that the schema.xml configuration that ships in the examples directory contains references to a number of text files – for example stopwords.txt, protwords.txt etc – which you may need to copy over as well.

Then restart SOLR.

You can also add a new core via the administrative web interface in your web browser – click Core Admin on the left hand side, then Add Core.

Additional Configuration

There are a few additional configuration files worth a mention.

The stopwords.txt file – or more specifically, the language-specific files such as lang/stopwords_en.txt – contain words which should be ignored by the search indexer, such as “a”, “the” and “at”. In most cases, you probably won’t need to modify this file.

Depending on your application, you may find that you need to add words to protwords.txt. This file contains a list of protected words that aren’t “stemmed” – that is, reduced to their basic form; for example “asked” becomes “ask”, “working” becomes “work”. Sometimes stemming attempts to “correct” words, perhaps removing what it thinks are erroneous letters of numbers at the end. You might be dealing with geographical areas and find that “Maine” is stemmed to “maine”.

You can specify synonyms – words with the same meaning – in synonyms.txt. Separate synonyms with commas on a per-line basis. For example:

GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television, Televisions, TV, TVs

You may also use synoyms.txt to help correct common spelling mistakes using synonym mappings, for example:

assassination => assasination
environment => enviroment

If you’re using currency fields, you may wish to update and keep an eye on currency.xml, which specifies some example exchange rates – which of course are highly volatile.

Summary

In this series we’ve looked at Apache’s SOLR implementation for search, and used the PHP Solarium library to interact with it. We’ve installed and configured SOLR along with an example schema, and built an application designed to search a set of movies, which demonstrates a number of features of SOLR. We’ve looked at faceted search, highlighting results and the DisMax component. Hopefully this will give you enough of a grounding to adapt it to use SOLR for search in your applications.

For further reading, you may wish to download the SOLR reference guide as a PDF, or consult the Solarium documentation.

Frequently Asked Questions (FAQs) about Using Solarium with Solr for Advanced Search

How can I implement autocomplete with Solr and Solarium?

Implementing autocomplete with Solr and Solarium involves creating a suggester in your Solr configuration file. This suggester will be used to provide suggestions for user queries. Once the suggester is set up, you can use Solarium’s Suggester query to get suggestions. The Suggester query will return a list of suggestions based on the user’s input, which you can then display to the user.

What is the difference between Solarium and Stellarium?

Solarium and Stellarium are two different software. Solarium is a PHP library that provides an API for interacting with Solr, a powerful search platform. On the other hand, Stellarium is a free open source planetarium for your computer. It shows a realistic sky in 3D, just like what you see with the naked eye, binoculars or a telescope.

How can I use Solarium to query Solr?

To use Solarium to query Solr, you first need to create a client instance with your Solr server’s configuration. Then, you can create a select query using the client’s createSelect function. You can set various parameters on the query, such as the fields to return, the query string, and any filters. Once the query is set up, you can execute it using the client’s execute function, which will return a result set that you can iterate over to access the individual documents.

How can I add documents to Solr using Solarium?

To add documents to Solr using Solarium, you first need to create a client instance with your Solr server’s configuration. Then, you can create an update query using the client’s createUpdate function. You can add documents to this query using the addDocument function, which takes a document instance as its argument. The document instance should have all the fields and values that you want to add to the document. Once all documents are added to the query, you can execute it using the client’s execute function.

How can I delete documents from Solr using Solarium?

To delete documents from Solr using Solarium, you first need to create a client instance with your Solr server’s configuration. Then, you can create an update query using the client’s createUpdate function. You can add delete commands to this query using the addDeleteById or addDeleteByQuery functions. Once all delete commands are added to the query, you can execute it using the client’s execute function.

How can I optimize my Solr index using Solarium?

To optimize your Solr index using Solarium, you first need to create a client instance with your Solr server’s configuration. Then, you can create an update query using the client’s createUpdate function. You can add an optimize command to this query using the addOptimize function. Once the optimize command is added to the query, you can execute it using the client’s execute function.

How can I handle errors when using Solarium with Solr?

When using Solarium with Solr, errors can be handled by catching the Solarium_Exception thrown by the client’s execute function. This exception will contain information about the error, such as the error message and the Solr response.

How can I use facets with Solarium and Solr?

To use facets with Solarium and Solr, you first need to create a select query using the client’s createSelect function. Then, you can add a facet set to the query using the addFacetSet function. You can add various types of facets to the facet set, such as field facets, query facets, and range facets. Once the facets are set up, you can execute the query using the client’s execute function, which will return a result set that includes the facet results.

How can I use highlighting with Solarium and Solr?

To use highlighting with Solarium and Solr, you first need to create a select query using the client’s createSelect function. Then, you can add a highlighter to the query using the addHighlighting function. You can set various parameters on the highlighter, such as the fields to highlight and the number of snippets to return. Once the highlighter is set up, you can execute the query using the client’s execute function, which will return a result set that includes the highlighting results.

How can I use pagination with Solarium and Solr?

To use pagination with Solarium and Solr, you first need to create a select query using the client’s createSelect function. Then, you can set the start and rows parameters on the query to specify the range of results to return. The start parameter specifies the index of the first result to return, and the rows parameter specifies the number of results to return. Once the pagination is set up, you can execute the query using the client’s execute function, which will return a result set that includes the specified range of results.

Using Solarium with SOLR for Search – Advanced