Using Solarium with SOLR for Search – Implementation
This is the third article in a four-part series on using Solarium, in conjunction with Apache’s SOLR search implementation.
In the first part I introduced the key concepts and we installed and set up SOLR. In part two we installed and configured Solarium, a library which enables us to use PHP to “talk” to SOLR as if it were a native component.
Now we’re finally ready to start building the search mechanism, which is the subject of this installment.
Basic Search
Let’s look at how to implement a really simple search:
$query = $client->createSelect();
$query->setQuery(Input::get('q'));
Input::get('q')
is simply Laravel’s way of grabbing aGET
orPOST
variable namedq
which, you’ll remember, is the name of our search form element.
Or better still, use a placeholder to escape the search phrase:
$query->setQuery('%P1%', array(Input::get('q')));
A placeholder is indicated by the % symbols. The letter “P” means “escape this as a Phrase”. The bound variables are passed as an array, and the number indicates the position in the array of the argument you wish to bind; bearing in mind that (perhaps unusually) 1 indicates the first item.
To run the search:
$resultset = $client->select($query);
You can now retrieve the number of results using the getNumFound()
method, for example:
printf('Your search yielded %d results:', $resultset->getNumFound());
$resultset
is an instance of Solarium\QueryType\Select\Result\Result
, which implements the Iterator
interface – so you can iterate through the results as follows:
foreach ($resultset as $document) {
. . .
}
Each result is an instance of Solarium\QueryType\Select\Result\Document
, which provides two ways in which you can access the individual fields – either as public properties, e.g.:
<h3><?php print $document->title ?></h3>
Or, you can iterate through the available fields:
foreach($document AS $field => $value)
{
// this converts multi-value fields to a comma-separated string
if(is_array($value)) $value = implode(', ', $value);
print '<strong>' . $field . '</strong>: ' . $value . '<br />';
}
Note that multi-value fields – such as cast
– will return an array; so in the example above, it will simply collapse these fields into a comma-separated list.
Okay, so that’s an overview of how to do it – now let’s plug it into our example application.
We’ll make the search respond to a GET
request rather than POST
because it’ll make it easier when we come to look at faceted search – although it’s actually very common for site searches to use GET
.
So the index route on the home controller (our application only has one page, after all) becomes the following:
/**
* Display the search form / run the search.
*/
public function getIndex()
{
if (Input::has('q')) {
// Create a search query
$query = $this->client->createSelect();
// Set the query string
$query->setQuery('%P1%', array(Input::get('q')));
// Execute the query and return the result
$resultset = $this->client->select($query);
// Pass the resultset to the view and return.
return View::make('home.index', array(
'q' => Input::get('q'),
'resultset' => $resultset,
));
}
// No query to execute, just return the search form.
return View::make('home.index');
}
Now let’s modify the view – app/views/home/index.blade.php
– so that it displays the search results, as well as a result count, by adding this below the search form:
@if (isset($resultset))
<header>
<p>Your search yielded <strong>{{ $resultset->getNumFound() }}</strong> results:</p>
<hr />
</header>
@foreach ($resultset as $document)
<h3>{{ $document->title }}</h3>
<dl>
<dt>Year</dt>
<dd>{{ $document->year }}</dd>
@if (is_array($document->cast))
<dt>Cast</dt>
<dd>{{ implode(', ', $document->cast) }}</dd>
@endif
</dl>
{{ $document->synopsis }}
@endforeach
@endif
Try running a few searches. Quite quickly, you might notice a major limitation. As an example, try searching for “Star Wars”, note the first few results and then do a search for “Mark Hamill”. No results – looks like the search is only taking into account the title attribute, but not the cast.
To alter this behavior we need to use the DisMax component. DisMax is an abbreviation of Disjunction Max. Disjunction means it searches across multiple fields. Max means that if a query matches multiple fields, the maximum scores are added together.
To indicate that we wish to perform a DisMax query:
$dismax = $query->getDisMax();
Then we can tell the search to look in multiple fields – separate them with a space:
$dismax->setQueryFields('title cast synopsis');
Now, if you try searching for “Mark Hamill” again, you’ll see that the search picks up the cast, as well as the title.
We can take our DisMax query one step further by attaching weights to fields. This allows you to prioritize certain fields over others – for example, you probably want title matches to give you a higher score than matching words in the synopsis. Take a look at the following line:
$dismax->setQueryFields('title^3 cast^2 synopsis^1');
This indicates that we wish matches on the cast field to be weighted much higher than the synopsis – by a magnitude of two – and the title field further still. For your own projects, you’ll probably want to play around and experiment with various queries to try and work out the optimum weightings, which are likely to be very specific to the application in question.
So just to sum up, we can implement searching over multiple fields by modifying app/controllers/HomeController.php
as follows:
// Set the query string
$query->setQuery('%P1%', array(Input::get('q')));
// Create a DisMax query
$dismax = $query->getDisMax();
// Set the fields to query, and their relative weights
$dismax->setQueryFields('title^3 cast^2 synopsis^1');
// Execute the query and return the result
$resultset = $this->client->select($query);
Specifying Which Fields to Return
If you run the search, then for each resultset document iterate through the fields, you’ll see that by default every field we’ve added to the index gets returned. In addition, SOLR adds the _version_
field, and the score
associated with the search result, along with the unique identifier.
The score is a numeric value which expresses the relevance of the result.
If you wish to change this behavior, there are three methods you can use:
$query->clearFields(); // return no fields
$query->addField('title'); // add 'title' to the list of fields returned
$query->addFields(array('title', 'cast')); // add several fields to the list of those returned
Note that you’ll probably need to use clearFields()
in conjunction with addField()
or addFields()
:
$query->clearFields()->addFields(array('title', 'cast'));
Just as in SQL, you can use an asterisk as a wildcard – meaning select all fields:
$query->clearFields()->addFields('*');
Sorting Search Results
By default, search results will be returned in descending order of score. In most cases this is probably what you want; “best matches” appear first.
However, you can change this behavior if you wish as follows:
$query->addSort('title', 'asc');
The syntax will probably look familiar; it’s very similar to SQL.
Pagination
You can specify the start
position – i.e., where to start listing results – and the number of rows
to return. Think of it as being like SQL’s LIMIT
clause. So for example, to take the first hundred results you’d do this:
$query->setStart(0);
$query->setRows(200);
Armed with the result of getNumFound()
and these functions, it should be straightforward to implement pagination, but for brevity I’m not going to go over that here.
Getting Started with SOLR Faceted Search
Faceted search essentially allows you to “drill down” through search results based on one or more criteria. It’s probably best illustrated by online stores, where you can refine a product search by things like category, format (e.g. paperbacks vs hardback vs digital books), whether it’s currently in stock or by price range.
Let’s expand our movie search with a basic facet; we’ll allow people to narrow down their movie search by its MPGG rating (a certificate specifying the appropriate age-range for a movie, e.g. “R” or “PG-13”).
To create a facet based on a field, you do this:
$facetSet = $query->getFacetSet();
$facetSet->createFacetField('rating')
->setField('rating');
Upon running the search, the result-set can now be broken down based on the value of the field – and you can also display a count for that particular value.
$facet = $resultset->getFacetSet()->getFacet('rating');
foreach($facet as $value => $count) {
echo $value . ' [' . $count . ']<br/>';
}
This will give you something along these lines:
Unrated [193]
PG [26]
R [23]
PG-13 [16]
G [9]
NC-17 [0]
A facet doesn’t have to use single, distinct values. You can use ranges – for example, you might have price ranges in an e-commerce site. To illustrate facet ranges in our movie search, we’re going to allow people to narrow their search to movies from particular decade.
Here’s the code to create the facet:
$facet = $facetSet->createFacetRange('years')
->setField('year')
->setStart(1900)
->setGap(10)
->setEnd(2020);
This indicates that we want to create a range-based facet on the year
field. We need to specify the start value – the year 1900 – and the end; i.e., the end of the current decade. We also need to set the gap; in other words we want increments of ten – a decade. To display the counts in our search results, we could do something like this:
$facet = $resultset->getFacetSet()->getFacet('years');
foreach($facet as $range => $count) {
if ($count) {
printf('%d's (%d)<br />', $range, $count);
}
}
This will result in something along these lines:
1970's (12)
1980's (6)
2000's (8)
Note that the facet will contain every possible value, so it’s important to check that the count is non-zero before displaying it.
Faceted Search: Filtering
So far we’ve used facets on the search results page to show the counts, but that’s of limited use unless we can allow users to filter their searches on them.
In the search callback, let’s first check whether the MPGG rating filter has been applied:
if (Input::has('rating')) {
$query->createFilterQuery('rating')->setQuery(sprintf('rating:%s', Input::get('rating')));
}
Actually, just as with the main search query, we can yet Solarium escape the search term rather than use sprintf
:
if (Input::has('rating')) {
$query->createFilterQuery('rating')->setQuery('rating:%T1%', array(Input::get('rating')));
}
Remember, the 1 indicates that we wish to use the first element of the array of arguments – it’s not a zero-based array. The T
indicates we wish to escape the value as a term (as opposed to P
for phrase).
Filtering on decade is slightly more complicated, because we’re filtering based on a range rather than a discreet value. We only have one value specified – in Input::get('decade')
– but we know that the upper bound is simply the start of the decade plus nine. So, for example, “the ‘Eighties” is represented by the value 1980, and the range 1980 through (1980 + 9) = 1989.
A range query takes the following form:
field: [x TO y]
So it would be:
year: [1980 TO 1989]
We can implement this as follows:
if (Input::has('decade')) {
$query->createFilterQuery('years')->setQuery(sprintf('year:[%d TO %d]', Input::get('decade'), (Input::get('decade') + 9)));
}
Alternatively we can use a helper instead. To get an instance of the helper class:
$helper = $query->getHelper();
To use it:
if (Input::has('decade')) {
$query->createFilterQuery('years')->setQuery($helper->rangeQuery('year', Input::get('decade'), (Input::get('decade') + 9)));
}
Whilst this may seem fairly academic, it’s worth knowing how to create an instance of the Solarium helper because it’s very useful for other things, such as geospatial support.
Faceted Search: The View
Now that we’ve covered how to set up faceted search, how to list the facets and how to run filters based on them, we can set up the corresponding view.
Open up app/views/home/index.blade.php
and modify the search results section to include an additional column, which will contain our facets:
@if (isset($resultset))
<div class="results row" style="margin-top:1em;">
<div class="col-sm-4 col-md-4 col-lg-3">
<?php $facet = $resultset->getFacetSet()->getFacet('rating'); ?>
<div class="panel panel-primary">
<div class="panel-heading">
<h3 class="panel-title">By MPGG Rating</h3>
</div>
<ul class="list-group">
@foreach ($facet as $value => $count)
@if ($count)
<li class="list-group-item">
<a href="?{{ http_build_query(array_merge(Input::all(), array('rating' => $value))) }}">{{ $value }}</a>
<span class="badge">{{ $count }}</span>
</li>
@endif
@endforeach
</ul>
</div>
<?php $facet = $resultset->getFacetSet()->getFacet('years'); ?>
<div class="panel panel-primary">
<div class="panel-heading">
<h3 class="panel-title">By Decade</h3>
</div>
<ul class="list-group">
@foreach ($facet as $value => $count)
@if ($count)
<li class="list-group-item">
<a href="?{{ http_build_query(array_merge(Input::all(), array('decade' => $value))) }}">{{ $value }}'s</a>
<span class="badge">{{ $count }}</span>
</li>
@endif
@endforeach
</ul>
</div>
</div>
<div class="col-sm-8 col-md-8 col-lg-9">
<!-- SEARCH RESULTS GO HERE, EXACTLY AS BEFORE -->
</div>
</div>
@endif
We’re doing as we discussed in the section on facetted search; grabbing the facet set, iterating through each item and displaying it along with a count of the number of results for that particular value.
Each facet item is a link, which when clicked will refresh the page but with that filter applied. It does this by merging in the appropriate value to the currently “active” set of GET parameters; so if you’ve already filtered on one facet, clicking an item in a different facet-set will maintain that filter by including the appropriate query parameters. It will also maintain your original query, which is set as “q” in the input array.
This approach has some limitations – for one thing, there’s no way to “reset” the filters, except to manually alter the query parameters in the address bar – but its aim is to demonstrate using multiple facets. I’ll leave improving it to you as an additional exercise!