App Search with Thinking Sphinx 3.0
Thinking Sphinx is now a very standard library for interfacing with Sphinx and has come a long way in its implementation of various features of Sphinx. The new version, 3.0, is a major rewrite and quite a departure especially in terms of setup. Also, it includes fairly advanced facet searching built into it. Lastly, this version is compatible only with Rails 3.1 or above.
Installation
Sphinx installation on Linux is pretty straightforward. Downloading and compiling from source is a preferred method of Sphinx installation, despite the fact that apt and yum repositories are almost up-to-date.
The newer versions of Sphinx has multi-query support with mysql, and hence it’s better to compile it with mysql development headers included.
$ wget http://sphinxsearch.com/files/sphinx-2.0.7-release.tar.gz
$ tar xvzf sphinx-2.0.7-release.tar.gz
$ cd sphinx-2.0.7-release/
$ ./configure --with-mysql
$ make
$ sudo make install
Once Sphinx is installed successfully, you can install Thinking Sphinx as a gem or bundle it in your application.
$ gem install thinking-sphinx -v "~> 3.0.2"
If mysql is your database of choice, you will have to make sure the mysql2 gem version is either locked at 0.3.12b4 or 0.3.12b5. The earlier versions of mysql2 gem will throw the following exception :
undefined method `next_result' for Mysql2
The error occurs because the previous versions of the mysql2 gem did not have multiquery support, and Sphinx now has that out-of-the-box.
Configuration
Thinking Sphinx has two main configuration files. In the earlier versions, the first file was named sphinx.yml, but has been renamed to thinking_sphinx.yml.
The second file is .sphinx.conf (e.g development.sphinx.conf) and is used to interface Sphinx with the database. Whenever a fresh index is created, a fresh conf file is created which lists all the Sphinx queries based on the defined indices and connection details in database.yml.
thinking_sphinx.yml is an extension of this file where you can pass extra options to improve your search results.
Use Case
Let’s say we have an application with the following models :
- Vendor – name, rating
- Shop – name, description, vendor_id, location
- Product – name, sku, description, price, shop_id
with the following model associations
- Vendor – has_many shops
- Shop – belongsto vendor, hasmany products
- Product – belongs_to shop
We have the following use cases which we will cover in our article:
- Search Vendor, Shop, Product individually
- Search across all models
- Search via asociation
- Define Facets to Create filters
Basics – Definition of Indices.
Thinking Sphinx 3.0 takes a cleaner approach to the definition of indices compared to earlier versions.
To start defining an index, you first need to create a directory named ‘indices’ under the app folder.
$ mkdir indices
Let’s follow our use case and create three index files in our indices folder:
- vendor_index.rb
- shop_index.rb
- product_index.rb
The index file definition includes the name of class on which need to search. It should be the same as the model class name.
The indices for attributes can be defined simply by writing :
indexes column_name
However, there are a few reserved keywords for Sphinx (e.g status). To make these columns acceptable by Sphinx, you would have to define them as a symbol.
indexes :status
According to our use case, let’s write our first index for the vendor model. This will index the vendor name and rating. We will use rating as a parameter to sort our search results for the vendor.
ThinkingSphinx::Index.define :vendor, :with => :active_record do
indexes name
indexes rating, :sortable => true
end
Now, while searching for products and stores, we need vendor name to be a common criteria for each search. So, if we need to search for a product called “Batman Action Figure” in the vendor “Marvel Toys”, we could search for it using the vendor name and find all the products related to it. Here is how it is done:
ThinkingSphinx::Index.define :vendor, :with => :active_record do
indexes name, :as => vendor_name
indexes rating, :sortable => true
end
Inside our other indexes, the associations are defined as:
shop_index.rb
ThinkingSphinx::Index.define :shop, :with => :active_record do
indexes name, description, location
has vendor(:name), :as => :vendor_name
end
product_index.rb
ThinkingSphinx::Index.define :product, :with => :active_record do
indexes name, description
indexes price, :sortable => true
has shop.vendor.name, :as => :vendor_name
end
In both the files above, we have called vendor_name
using associations. Shop belongs to a vendor, so we could make a direct call from vendor and call it vendor_name
, whereas in product, we called it via the shop
association.
Running and Generating the Index
Once, we’re done with writing the indexes in our files, we would need to generate an index and run our Sphinx server.
$ rake ts:index
$ rake ts:start
Searching
Once our Thinking Sphinx is up and running, we can write controller methods to search and display the results.
Running search on individual models looks like this:
@search_products = Product.search(params[:search], :ranker => :proximity, :match_mode => :any)
If you want to create an application wide search, you can call the ThinkingSphinx.search
method to define models to be searched. You can tie this to any route and pass the search term as a parameter.
def search
@search = ThinkingSphinx.search(params[:search],
:classes => [ Vendor, Store, Product],
:ranker => :bm25,
:match_mode => :any,
:order => '@weight DESC',
:page => params[:page],
:per_page => 10)
end
Ranking uses different algorithms like bm25 and proximity for generating best matches within the search results.
Sorting uses an order field and can be defined to sort the results in different ways. Based on rating, alphabetical, or created_at are just some of the ways it can be sorted.
@weight is a default keyword that contains Sphinx ranking value.
Other Stuff
Field Weights :
We can rank our search results according to different fields inside a particular model:
@search_products = Product.search(params[:search],
:ranker => :proximity,
:match_mode => :any,
:field_weights => {:description => :15,
:name => 10})
You can also write field weights inside your index files using the set_property
rule like this:
:set_property :field_weights => {:description => 15,
:name => 10})
Getting excerpts from the search results:
@excerpter = ThinkingSphinx::Excerpter.new 'product_core', params[:search], { :before_match => '<span class="match">',
:after_match => '</span>',
:chunkseparator => ' … '}
Adding Facets :
Facet-based search is used when you need to filter according to different classes or parameters. Facet definition can be done by simply defining a facet => true
rule against the attribute to facet in the index file.
indexes name, :facet => true
This and then just rebuild the index.
rake ts:rebuild
In your controller, facet
accepts the same parameters as search
, so your facet would look like this :
@facets = ThinkingSphinx.facets(params[:search],
:class_facet => false,
:page => params[:page],
:per_page => 10)
If no facets are defined on any model and a facet search is written, it would fall back to class facets, calling the search on individual model classes. You can turn it off by calling false inside your facet rule, as shown above.
You can display the facet results in your view in the following way:
<% @facets[:class].each do |option, count|%>
<%= link_to "#{option} (#{count})", :params => { :facet => option, :page => 1}%>
<%end%><br/>
Searching for partial words can be done using min_infix_len
inside your thinking_sphinx.yml or inside your index file using set_property
. This means it will match at the least 3 characters before declaring it a match. min_infix_len
is not advisable to be kept at 1, as it could be a serious memory hog.
development:
mem_limit: 128M
min_infix_len: 3
test:
mem_limit: 128M
min_infix</em>len: 3
production:
mem_limit: 128M
min_infix_len: 3
Delta Indexes
Sphinx, by default, runs indexes from scratch everytime you run and index command. In order to avoid Sphinx starting from scratch, we can define delta indexes. This will only index the documents that are newly created.
set_property :delta => true
Conclusion
Thinking Sphinx 3.0 brings along a lot of advancements and bug fixes from previous versions. It takes a very clean approach, placing the search code outside your app models. Therefore, as queries start getting complex, the code remains readable.
Hopefully, this article has inspired you to give Thinking Sphinx a try in your application.