PHP
Article

Geospatial Search with SOLR and Solarium

By Lukas White

In a recent series of articles I looked in detail at Apache’s SOLR and Solarium.

To recap; SOLR is a search service with a raft of features – such as faceted search and result highlighting – which runs as a web service. Solarium is a PHP library which allows you to integrate with SOLR – whether local or remote – interacting with it as if it were a native component of your application. If you’re unfamiliar with either, then my series is over here, and I’d urge you to take a look.

In this article, I’m going to look at another part of SOLR which warrants its own discussion; Geospatial search.

locationsearch

An Example

I’ve put together a simple example application to accompany this article. You can get it from Github, or see it in action here.

Before we delve into that, let’s look at some of the background.

Sometimes, the things you want to search for have geographical locations. Often, that provides vital context. It’s all very well me being able to search for “Italian restaurants”, but I’m hungry – a restaurant on another continent, as good as it might be, is of no help. Rather, it would be far more useful to be able to run a search which asks “show me Italian restaurants, but within 5 miles”. Or alternatively, “show me the ten closest Italian restaurants”. That’s where Geospatial search comes in.

Geospatial Search and Points

In geospatial applications we often talk about “points”; i.e., a specific geographical location. Specifically, we’re really talking about a latitude and longitude pair. A latitude and longitude defines a point on the globe, potentially to within a few metres.

One of the challenges when you’re developing anything involving geographic points is that you need some way of making sense of them for people who don’t think in latitude and longitude – which I’m pretty sure is most of us. Geolocation comes in handy here, because it can be used to determine the latitude and longitude of “where you are”, without the ambiguities of place names. (If you want to take the latter approach, I’ve written about it before.)

So, the first challenge when you’re doing any sort of geo-related work is to work out how to determine the start point – i.e., where to search from. In our example application we’ll hedge our bets and take three approaches. We’ll use the HTML5 geolocation functionality to allow the user’s browser to locate them. For convenience and simplicity we’ll include an arbitrary list of some major cities, which when clicked will populate the latitude and longitude from some hard-coded values. Finally, just so we have all bases covered, and for the geo-geeks among us, we’ll include text fields in which users can manually enter their latitude and longitude.

Setting up the Schema

In order to get our SOLR core setup to support geographical locations, we need to perform some tweaks to the schema.

The first thing we need to do is to add the location field type to schema.xml:

<fieldType name="location"  class="solr.LatLonType" subFieldSuffix="_coordinate"/>

Note that this field is made up of sub-fields; i.e., a latitude and a longitude. We need to ensure we have a suitable type for those:

<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/>

As you can see, it’s basically a field of type double (specifically tdouble, represented internally by the Java class solr.TrieDoubleField).

Both of these <fieldType> declarations need to be placed within the <fields> element of your schema.xml.

Now that the types are set up, you can define a new field to hold the latitude and longitude. In the following example, I’m calling it latlon:

<field name="latlon"        type="location" indexed="true"  stored="true"  multiValued="false" />

It’s important that multiValued is set to false – multiple lat/lon pairs aren’t supported.

You’ll also need to set up a dynamic field for the components; i.e. the latitude and longitude. _coordinate refers to the suffix we specified when we defined our location field type above.

<dynamicField name="*_coordinate"  type="tdouble" indexed="true"  stored="false"/>

Both the <field> and <dynamicField> declarations go in the <fields> section.

Your schema is now set up to support latitude / longitude pairs, and we’ve added a field called latlon. Next, let’s look at how to populate that field.

You’ll find an example schema.xml file in the sample application’s repository.

Assigning Location Data

When it comes to assigning a value to a location field, you need to do this:

$doc = {lat},{long}

So, using Solarium:

$doc->latlon = doubleval($latitude) . "," . doubleval($longitude);

Refer to the section “Populating the Data” for a concrete example.

Geospatial Queries in SOLR with Solarium

You might recall that in part three of the SOLR series, we looked at Solarium’s helpers. Basically, these act as syntactic sugar, enabling you to create more complex queries without having to worry too much about the underlying SOLR query syntax.

Here’s an example of how to add an additional filter to a search query, which – given a $latitude and a $longitude – limits the results to within $distance kilometres:

$query->createFilterQuery('distance')->setQuery(
	$helper->geofilt(
		'latlon', 
		doubleval($latitude),
		doubleval($longitude),
		doubleval($distance)
	)
);

If you prefer to work in miles, you simply need to multiply $distance by 1.609344:

$query->createFilterQuery('distance')->setQuery(
	$helper->geofilt(
		'latlon', 
		doubleval($latitude),
		doubleval($longitude),
		doubleval($distance * 1.609344))
	)
);

If you want to return the distance with the search results, you’ll need to add the geodist function to the list of fields, using the same values as the geofilt filter. Again, you can use a helper:

$query->addField($helper->geodist(
	'latlon', 
	doubleval($latitude), 
	doubleval($longitude)
	)
);

It’s far more useful to add a field alias, much like you would in SQL, which you can use to retrieve the value later. The convention with aliases is to prefix and suffix with an underscore, like so:

$query->addField('_distance_:' . $helper->geodist(
	'latlon', 
	doubleval($latitude), 
	doubleval($longitude)
	)
);

Now, you can display the distance in your search results:

<ul>
	<?php foreach ($resultset as $document): ?>
	<li><?php print $doc->title ?> (<?php print round($document->_distance_, 2) ?> kilometres away)</li>
	<?php endforeach; ?>
</ul>

In order to sort the results by distance, you need to apply a little trickery. Rather than use setSort, you actually need to use a query; this is then used to “score” results based on distance. The underlying SOLR query will look like this:

{!func}geodist(fieldname,lat,lng)

To do this with Solarium, again using a helper:

$query->setQuery('{!func}' . $helper->geodist(
	'latlon', 
	doubleval($latitude), 
	doubleval($longitude)
));

The net result of this is that the score will reflect the proximity; the lower the score, the closer it is geographically.

So, to sort the results by distance, closest first:

$query->addSort('score', 'asc');

Enough of the theory; let’s build something.

Building our Example Application

I’ve created a simple example application where people can search for their nearest airports, which you can find on Github, in the solr folder. There’s an online demo here.

It uses Silex as a framework, along with Twig for templating. You shouldn’t need an in-depth knowledge of either in order to follow along, since most of the application’s complexity comes from the SOLR integration, which is covered here.

Populating the Data

The data we’re using is taken from the excellent OpenFlights.org service. You’ll find the data file in the repository, along with a simple script to populate the search index – run it as follows:

php scripts/populate.php

Here’s the relevant section:

// Now let's start importing
while (($row = fgetcsv($fp, 1000, ",")) !== FALSE) {

	// get an update query instance
	$update = $client->createUpdate();

	// Create a document
	$doc = $update->createDocument();    

	$doc->id = $row[0];
	$doc->name = $row[1];
	$doc->city = $row[2];
	$doc->country = $row[3];
	$doc->faa_faa_code = $row[4];
	$doc->icao_code = $row[5];
	$doc->altitude = $row[8];

	$doc->latlon = doubleval($row[6]) . "," . $row[7];

	// Let's simply add and commit straight away.
	$update->addDocument($doc);
	$update->addCommit();

	// this executes the query and returns the result
	$result = $client->update($update);

	$num_imported++;

	// Sleep for a couple of seconds, lest we go too fast for SOLR
	sleep(2);

}

Building the Search Form

We’ll start with a simple form with longitude and latitude fields, as well as a drop-down with which the user can specify the distance to limit to:

<form method="get" action="/">

	<div class="form-group">
		<a href="#/" id="findme" class="btn btn-default"><i class="icon icon-target"></i> Find my location</a>
	</div>

	<div class="form-group">
		<label for="form-lat">Latitude</label>
		<input type="text" name="lat" id="form-lat" class="form-control" />
	</div>

	<div class="form-group">
		<label for="form-lat">Longitude</label>
		<input type="text" name="lng" id="form-lat" class="form-control" />
	</div>

	<div class="form-group">
		<label for="form-dist">Within <em>x</em> kilometers</label>
		<select name="dist" id="form-dist" class="form-control">		            			            	
			<option value="50">50</option>
			<option value="100">100</option>
			<option value="250">250</option>
			<option value="500">500</option>		            			            	
		</select>
	</div>

	<div class="form-group">
		<button type="submit" class="btn btn-primary"><i class="icon icon-search"></i>  Search</button>		          	
	</div>
</form>

Next, let’s implement the “find me” button, which uses HTML5 geolocation – if the user’s browser supports it – to populate the search form.

function success(position) {
	$('input[name="lat"]').val(position.coords.latitude);
	$('input[name="lng"]').val(position.coords.longitude);		  
}

function error(msg) {
	alert(msg);
}

$('#findme').click(function(){
	if (navigator.geolocation) {
		navigator.geolocation.getCurrentPosition(success, error);
	} else {
		error('not supported');
	}
});

Users will need to grant our application permission to locate them, so really it’s best to run this upon some sort of user interaction, such as at the click of a button, rather than on page-load.

Finally, we’ll provide a list of “default” cities; a user can click one to populate the latitude and longitude fields automatically.

Here’s the HTML, showing a limited number of cities for brevity:

<ul id="cities">
	<li><a href="#/" data-lat="52.51670" data-lng="13.33330">Berlin, Germany</a></li>
	<li><a href="#/" data-lat="-34.33320" data-lng="-58.49990">Buenos Aires, Argentina</a></li>

The corresponding JavaScript is extremely simple:

$('#cities a').click(function(e){
	$('input[name="lat"]').val($(this).data('lat'));
	$('input[name="lng"]').val($(this).data('lng'));
});

Next up, we’re going to implement the search.

The Search Page

Let’s start by defining a single route; for the one and only page in our example application. It will display the search form, as well as displaying the results when the latutude and longitude are provided via GET parameters by submitting the form.

// Display the search form / run the search
$app->get('/', function (Request $request) use ($app) {

	$resultset = null;

	$query = $app['solr']->createSelect();
	$helper = $query->getHelper();

	$query->setRows(100);

	$query->addSort('score', 'asc');
	
	if (($request->get('lat')) && ($request->get('lng'))) {
		
		$latitude = $request->get('lat');
		$longitude = $request->get('lng');
		$distance = $request->get('dist');

		$query->createFilterQuery('distance')->setQuery(
				$helper->geofilt(
					'latlon', 
					doubleval($latitude),
					doubleval($longitude),
					doubleval($distance)
				)
			);

		$query->setQuery('{!func}' . $helper->geodist(
			'latlon', 
			doubleval($latitude), 
			doubleval($longitude)
		));

		$query->addField('_distance_:' . $helper->geodist(
			'latlon', 
			doubleval($latitude), 
			doubleval($longitude)
			)
		);

		$resultset = $app['solr']->select($query);

	}
		
	// Render the form / search results
	return $app['twig']->render('index.twig', array(
		'resultset' => $resultset,
	));

});

The boilerplate code is pretty simple stuff – defining the route, grabbing the relevant parameters and rendering the view.

The code which runs the search utilizes the code we looked at earlier. Essentially it does the following:

  1. Creates a filter query, restricting the search to within $distance km of the point specified by $latitude and $longitude; all three are provided as GET parameters
  2. Uses the geodist helper to tell Solarium which field we’re interested in (the latlon field we defined earlier) in order to sort the results
  3. Adds a pseudo-field _distance_ so that we can incorporate it into our search results
  4. Runs the query and assigns its result to the view.

Displaying the Results

Here’s the portion of the template which is responsible for displaying the search results:

{% if resultset %}
	{% for doc in resultset %}
	<article>
		<h4><i class="icon icon-airplane"></i> {{ doc.name }}</h4>
		<p><strong>{{ doc.city }}</strong>, {{ doc.country}} ({{ doc._distance_|number_format }} km away)</p>
	</article>
	<hr />
	{% endfor %}
{% endif %}

It’s pretty straightforward; note how the _distance_ field is available in our search result document, along with the name and country fields. We’re using Twig’s number_format filter to format the distance.

That’s all there is to it – you’ll find the complete example in the repository.

Of course, this example is only searching based on distance. You can of course combine text-based search with geospatial search – I’ll leave that as an exercise.

Summary

In this article I’ve shown how you can use SOLR – in conjunction with the PHP library Solarium – in order to perform geospatial searches. We’ve looked at some of the theory, then dived into setting up our schema, constructing our query and putting it into practice.

Feedback? Comments? Leave them below!

  • steven c shepard

    Great tutorial, although I only gleaned out the html5 geoencoding portions, but those were very nice handled.. My only suggestion: have both jQuery and “plain” JavaScript html coded examples to compare with. Love to do the whole article, but don’t SOLR yet installed. Super!

  • xelber

    Excellent work. Was able to get the example working in few minutes. The hosted example did not work btw. Thanks a lot for your effort.

  • Jason Ladd

    Awesome tutorial, thanks so much! However I had an issue trying to put the fieldType declarations inside the fields tag. Solr would only allow me to put them in the types tag… but when I did that, Everything worked except returning the distance from my original location in a pseudo field. Any thoughts as to what may cause that?

    • Jason Ladd

      Solved! Just needed to upgrade solr. I was using solr version 3.6.2, upgraded to 5.2.1 and now it works! beware though if you get the ‘can not use FieldCache on multivalued field’ error, you’ll need to make sure you rename solr 5’s managed-schema to schema.xml and add the appropriate fields as stated above. Out of the box, solr 5 uses this managed-schama file so you’ll need to look for that instead of schema.xml, (which I believe solr 4 still does use).

  • elkhawajah

    Is it possible to do geospatial search without specifying distance limit ? i.e. I want nearest items and limit the number of results to 10, then via pagination I can get next 10 .. etc ?

Recommended

Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Get the latest in PHP, once a week, for free.