In a recent series of articles I looked in detail at Apache’s SOLR and Solarium.
To recap; SOLR is a search service with a raft of features – such as faceted search and result highlighting – which runs as a web service. Solarium is a PHP library which allows you to integrate with SOLR – whether local or remote – interacting with it as if it were a native component of your application. If you’re unfamiliar with either, then my series is over here, and I’d urge you to take a look.
In this article, I’m going to look at another part of SOLR which warrants its own discussion; Geospatial search.
Key Takeaways
- Geospatial search in SOLR and Solarium allows users to perform location-based searches, crucial for industries like real estate and logistics, by querying documents within a certain distance from a specified point.
- Setting up geospatial search requires modifications to SOLR schema.xml, including adding a location field type and configuring latitude and longitude as sub-fields to enable precise location queries.
- Solarium, a PHP library, simplifies the construction of geospatial queries in SOLR by providing helper functions that abstract complex query syntax, making it easier to add filters like distance limitations and sorting by proximity.
- The example application demonstrates practical implementation of geospatial search, using SOLR with Solarium to allow users to find the nearest airports, showcasing how to set up the schema, assign location data, and build search functionality.
- Enhancements to geospatial search performance in SOLR can be achieved through strategies such as using the “bbox” filter for faster but less accurate results, or employing the “RPT” field type for high-performance searches on large datasets.
An Example
I’ve put together a simple example application to accompany this article. You can get it from Github, or see it in action here.
Before we delve into that, let’s look at some of the background.
What is Geospatial Search?
Sometimes, the things you want to search for have geographical locations. Often, that provides vital context. It’s all very well me being able to search for “Italian restaurants”, but I’m hungry – a restaurant on another continent, as good as it might be, is of no help. Rather, it would be far more useful to be able to run a search which asks “show me Italian restaurants, but within 5 miles”. Or alternatively, “show me the ten closest Italian restaurants”. That’s where Geospatial search comes in.
Geospatial Search and Points
In geospatial applications we often talk about “points”; i.e., a specific geographical location. Specifically, we’re really talking about a latitude and longitude pair. A latitude and longitude defines a point on the globe, potentially to within a few metres.
One of the challenges when you’re developing anything involving geographic points is that you need some way of making sense of them for people who don’t think in latitude and longitude – which I’m pretty sure is most of us. Geolocation comes in handy here, because it can be used to determine the latitude and longitude of “where you are”, without the ambiguities of place names. (If you want to take the latter approach, I’ve written about it before.)
So, the first challenge when you’re doing any sort of geo-related work is to work out how to determine the start point – i.e., where to search from. In our example application we’ll hedge our bets and take three approaches. We’ll use the HTML5 geolocation functionality to allow the user’s browser to locate them. For convenience and simplicity we’ll include an arbitrary list of some major cities, which when clicked will populate the latitude and longitude from some hard-coded values. Finally, just so we have all bases covered, and for the geo-geeks among us, we’ll include text fields in which users can manually enter their latitude and longitude.
Setting up the Schema
In order to get our SOLR core setup to support geographical locations, we need to perform some tweaks to the schema.
The first thing we need to do is to add the location
field type to schema.xml
:
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
Note that this field is made up of sub-fields; i.e., a latitude and a longitude. We need to ensure we have a suitable type for those:
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
As you can see, it’s basically a field of type double
(specifically tdouble
, represented internally by the Java class solr.TrieDoubleField
).
Both of these <fieldType>
declarations need to be placed within the <fields>
element of your schema.xml
.
Now that the types are set up, you can define a new field to hold the latitude and longitude. In the following example, I’m calling it latlon
:
<field name="latlon" type="location" indexed="true" stored="true" multiValued="false" />
It’s important that multiValued
is set to false
– multiple lat/lon pairs aren’t supported.
You’ll also need to set up a dynamic field for the components; i.e. the latitude and longitude. _coordinate
refers to the suffix we specified when we defined our location
field type above.
<dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false"/>
Both the <field>
and <dynamicField>
declarations go in the <fields>
section.
Your schema is now set up to support latitude / longitude pairs, and we’ve added a field called latlon
. Next, let’s look at how to populate that field.
You’ll find an example schema.xml
file in the sample application’s repository.
Assigning Location Data
When it comes to assigning a value to a location field, you need to do this:
$doc = {lat},{long}
So, using Solarium:
$doc->latlon = doubleval($latitude) . "," . doubleval($longitude);
Refer to the section “Populating the Data” for a concrete example.
Geospatial Queries in SOLR with Solarium
You might recall that in part three of the SOLR series, we looked at Solarium’s helpers. Basically, these act as syntactic sugar, enabling you to create more complex queries without having to worry too much about the underlying SOLR query syntax.
Here’s an example of how to add an additional filter to a search query, which – given a $latitude
and a $longitude
– limits the results to within $distance
kilometres:
$query->createFilterQuery('distance')->setQuery(
$helper->geofilt(
'latlon',
doubleval($latitude),
doubleval($longitude),
doubleval($distance)
)
);
If you prefer to work in miles, you simply need to multiply $distance
by 1.609344
:
$query->createFilterQuery('distance')->setQuery(
$helper->geofilt(
'latlon',
doubleval($latitude),
doubleval($longitude),
doubleval($distance * 1.609344))
)
);
If you want to return the distance with the search results, you’ll need to add the geodist
function to the list of fields, using the same values as the geofilt
filter. Again, you can use a helper:
$query->addField($helper->geodist(
'latlon',
doubleval($latitude),
doubleval($longitude)
)
);
It’s far more useful to add a field alias, much like you would in SQL, which you can use to retrieve the value later. The convention with aliases is to prefix and suffix with an underscore, like so:
$query->addField('_distance_:' . $helper->geodist(
'latlon',
doubleval($latitude),
doubleval($longitude)
)
);
Now, you can display the distance in your search results:
<ul>
<?php foreach ($resultset as $document): ?>
<li><?php print $doc->title ?> (<?php print round($document->_distance_, 2) ?> kilometres away)</li>
<?php endforeach; ?>
</ul>
In order to sort the results by distance, you need to apply a little trickery. Rather than use setSort
, you actually need to use a query; this is then used to “score” results based on distance. The underlying SOLR query will look like this:
{!func}geodist(fieldname,lat,lng)
To do this with Solarium, again using a helper:
$query->setQuery('{!func}' . $helper->geodist(
'latlon',
doubleval($latitude),
doubleval($longitude)
));
The net result of this is that the score will reflect the proximity; the lower the score, the closer it is geographically.
So, to sort the results by distance, closest first:
$query->addSort('score', 'asc');
Enough of the theory; let’s build something.
Building our Example Application
I’ve created a simple example application where people can search for their nearest airports, which you can find on Github, in the solr
folder. There’s an online demo here.
It uses Silex as a framework, along with Twig for templating. You shouldn’t need an in-depth knowledge of either in order to follow along, since most of the application’s complexity comes from the SOLR integration, which is covered here.
Populating the Data
The data we’re using is taken from the excellent OpenFlights.org service. You’ll find the data file in the repository, along with a simple script to populate the search index – run it as follows:
php scripts/populate.php
Here’s the relevant section:
// Now let's start importing
while (($row = fgetcsv($fp, 1000, ",")) !== FALSE) {
// get an update query instance
$update = $client->createUpdate();
// Create a document
$doc = $update->createDocument();
$doc->id = $row[0];
$doc->name = $row[1];
$doc->city = $row[2];
$doc->country = $row[3];
$doc->faa_faa_code = $row[4];
$doc->icao_code = $row[5];
$doc->altitude = $row[8];
$doc->latlon = doubleval($row[6]) . "," . $row[7];
// Let's simply add and commit straight away.
$update->addDocument($doc);
$update->addCommit();
// this executes the query and returns the result
$result = $client->update($update);
$num_imported++;
// Sleep for a couple of seconds, lest we go too fast for SOLR
sleep(2);
}
Building the Search Form
We’ll start with a simple form with longitude and latitude fields, as well as a drop-down with which the user can specify the distance to limit to:
<form method="get" action="/">
<div class="form-group">
<a href="#/" id="findme" class="btn btn-default"><i class="icon icon-target"></i> Find my location</a>
</div>
<div class="form-group">
<label for="form-lat">Latitude</label>
<input type="text" name="lat" id="form-lat" class="form-control" />
</div>
<div class="form-group">
<label for="form-lat">Longitude</label>
<input type="text" name="lng" id="form-lat" class="form-control" />
</div>
<div class="form-group">
<label for="form-dist">Within <em>x</em> kilometers</label>
<select name="dist" id="form-dist" class="form-control">
<option value="50">50</option>
<option value="100">100</option>
<option value="250">250</option>
<option value="500">500</option>
</select>
</div>
<div class="form-group">
<button type="submit" class="btn btn-primary"><i class="icon icon-search"></i> Search</button>
</div>
</form>
Next, let’s implement the “find me” button, which uses HTML5 geolocation – if the user’s browser supports it – to populate the search form.
function success(position) {
$('input[name="lat"]').val(position.coords.latitude);
$('input[name="lng"]').val(position.coords.longitude);
}
function error(msg) {
alert(msg);
}
$('#findme').click(function(){
if (navigator.geolocation) {
navigator.geolocation.getCurrentPosition(success, error);
} else {
error('not supported');
}
});
Users will need to grant our application permission to locate them, so really it’s best to run this upon some sort of user interaction, such as at the click of a button, rather than on page-load.
Finally, we’ll provide a list of “default” cities; a user can click one to populate the latitude and longitude fields automatically.
Here’s the HTML, showing a limited number of cities for brevity:
<ul id="cities">
<li><a href="#/" data-lat="52.51670" data-lng="13.33330">Berlin, Germany</a></li>
<li><a href="#/" data-lat="-34.33320" data-lng="-58.49990">Buenos Aires, Argentina</a></li>
The corresponding JavaScript is extremely simple:
$('#cities a').click(function(e){
$('input[name="lat"]').val($(this).data('lat'));
$('input[name="lng"]').val($(this).data('lng'));
});
Next up, we’re going to implement the search.
The Search Page
Let’s start by defining a single route; for the one and only page in our example application. It will display the search form, as well as displaying the results when the latutude and longitude are provided via GET parameters by submitting the form.
// Display the search form / run the search
$app->get('/', function (Request $request) use ($app) {
$resultset = null;
$query = $app['solr']->createSelect();
$helper = $query->getHelper();
$query->setRows(100);
$query->addSort('score', 'asc');
if (($request->get('lat')) && ($request->get('lng'))) {
$latitude = $request->get('lat');
$longitude = $request->get('lng');
$distance = $request->get('dist');
$query->createFilterQuery('distance')->setQuery(
$helper->geofilt(
'latlon',
doubleval($latitude),
doubleval($longitude),
doubleval($distance)
)
);
$query->setQuery('{!func}' . $helper->geodist(
'latlon',
doubleval($latitude),
doubleval($longitude)
));
$query->addField('_distance_:' . $helper->geodist(
'latlon',
doubleval($latitude),
doubleval($longitude)
)
);
$resultset = $app['solr']->select($query);
}
// Render the form / search results
return $app['twig']->render('index.twig', array(
'resultset' => $resultset,
));
});
The boilerplate code is pretty simple stuff – defining the route, grabbing the relevant parameters and rendering the view.
The code which runs the search utilizes the code we looked at earlier. Essentially it does the following:
- Creates a filter query, restricting the search to within
$distance
km of the point specified by$latitude
and$longitude
; all three are provided asGET
parameters - Uses the
geodist
helper to tell Solarium which field we’re interested in (the latlon field we defined earlier) in order to sort the results - Adds a pseudo-field
_distance_
so that we can incorporate it into our search results - Runs the query and assigns its result to the view.
Displaying the Results
Here’s the portion of the template which is responsible for displaying the search results:
{% if resultset %}
{% for doc in resultset %}
<article>
<h4><i class="icon icon-airplane"></i> {{ doc.name }}</h4>
<p><strong>{{ doc.city }}</strong>, {{ doc.country}} ({{ doc._distance_|number_format }} km away)</p>
</article>
<hr />
{% endfor %}
{% endif %}
It’s pretty straightforward; note how the _distance_
field is available in our search result document, along with the name
and country
fields. We’re using Twig’s number_format filter to format the distance.
That’s all there is to it – you’ll find the complete example in the repository.
Of course, this example is only searching based on distance. You can of course combine text-based search with geospatial search – I’ll leave that as an exercise.
Summary
In this article I’ve shown how you can use SOLR – in conjunction with the PHP library Solarium – in order to perform geospatial searches. We’ve looked at some of the theory, then dived into setting up our schema, constructing our query and putting it into practice.
Feedback? Comments? Leave them below!
Frequently Asked Questions on Geospatial Search with Solr and Solarium
What is the significance of geospatial search in Solr and Solarium?
Geospatial search is a powerful feature in Solr and Solarium that allows users to perform searches based on geographic locations. It is particularly useful in applications where location data is crucial, such as real estate, travel, and logistics. With geospatial search, you can query for documents within a certain distance from a point, sort documents by distance, and even aggregate documents by geospatial facets.
How does Solr handle geospatial data?
Solr uses a field type called “location” to handle geospatial data. This field type is used to index latitude and longitude coordinates. When a geospatial search query is made, Solr calculates the distance between the indexed location and the location specified in the search query. This allows Solr to return documents that are within a certain distance from the specified location.
How can I perform a geospatial search in Solarium?
In Solarium, you can perform a geospatial search by using the “geofilt” and “bbox” filters. The “geofilt” filter returns documents that fall within a specified radius of a point, while the “bbox” filter returns documents that fall within a bounding box around a point. To use these filters, you need to specify the field name, the point of origin (in latitude and longitude), and the distance.
What is the difference between “geofilt” and “bbox” filters in Solarium?
The “geofilt” and “bbox” filters in Solarium are both used for geospatial search, but they work in slightly different ways. The “geofilt” filter calculates the exact distance from the point of origin to each document, and returns documents that are within a specified radius. On the other hand, the “bbox” filter calculates a bounding box around the point of origin, and returns documents that fall within this box. The “bbox” filter is faster but less accurate than the “geofilt” filter.
How can I sort documents by distance in Solr?
In Solr, you can sort documents by distance using the “geodist” function. This function calculates the distance from a point to each document, and can be used in the “sort” parameter of a search query. For example, to sort documents by distance from a specific location, you would use a query like: sort=geodist() asc.
Can I perform geospatial search on multiple fields in Solr?
Yes, Solr supports geospatial search on multiple fields. You can specify multiple fields in the “sfield” parameter of a geospatial search query. This allows you to search for documents that match a location in any of the specified fields.
How can I improve the performance of geospatial search in Solr?
There are several ways to improve the performance of geospatial search in Solr. One way is to use the “bbox” filter instead of the “geofilt” filter, as it is faster but less accurate. Another way is to use the “RPT” (Recursive Prefix Tree) field type, which is designed for high performance geospatial search.
What is the role of the “SpatialRecursivePrefixTreeFieldType” in Solr?
The “SpatialRecursivePrefixTreeFieldType” in Solr is a field type that is optimized for geospatial search. It uses a spatial index to quickly find documents that are near a specified location. This field type is particularly useful for large datasets, as it can significantly improve the performance of geospatial search queries.
How does Solr handle multi-valued location fields?
Solr can handle multi-valued location fields, which are fields that contain multiple locations. When a geospatial search query is made, Solr calculates the distance from the specified location to each location in the field, and returns the minimum distance. This allows Solr to accurately handle documents that are associated with multiple locations.
Can I use geospatial search with other types of search in Solr?
Yes, you can combine geospatial search with other types of search in Solr. For example, you can use the “fq” parameter to filter the results of a geospatial search query based on other criteria. This allows you to perform complex searches that take into account both location and other factors.
Lukas is a freelance web and mobile developer based in Manchester in the North of England. He's been developing in PHP since moving away from those early days in web development of using all manner of tools such as Java Server Pages, classic ASP and XML data islands, along with JavaScript - back when it really was JavaScript and Netscape ruled the roost. When he's not developing websites and mobile applications and complaining that this was all fields, Lukas likes to cook all manner of World foods.