App Search with Thinking Sphinx 3.0

Saurabh Bhatia
Share

Capture Thinking Sphinx is now a very standard library for interfacing with Sphinx and has come a long way in its implementation of various features of Sphinx. The new version, 3.0, is a major rewrite and quite a departure especially in terms of setup. Also, it includes fairly advanced facet searching built into it. Lastly, this version is compatible only with Rails 3.1 or above.

Installation

Sphinx installation on Linux is pretty straightforward. Downloading and compiling from source is a preferred method of Sphinx installation, despite the fact that apt and yum repositories are almost up-to-date.

The newer versions of Sphinx has multi-query support with mysql, and hence it’s better to compile it with mysql development headers included.

$ wget http://sphinxsearch.com/files/sphinx-2.0.7-release.tar.gz
$ tar xvzf sphinx-2.0.7-release.tar.gz
$ cd sphinx-2.0.7-release/
$ ./configure --with-mysql
$ make
$ sudo make install

Once Sphinx is installed successfully, you can install Thinking Sphinx as a gem or bundle it in your application.

$ gem install thinking-sphinx -v "~> 3.0.2"

If mysql is your database of choice, you will have to make sure the mysql2 gem version is either locked at 0.3.12b4 or 0.3.12b5. The earlier versions of mysql2 gem will throw the following exception :

undefined method `next_result' for Mysql2

The error occurs because the previous versions of the mysql2 gem did not have multiquery support, and Sphinx now has that out-of-the-box.

Configuration

Thinking Sphinx has two main configuration files. In the earlier versions, the first file was named sphinx.yml, but has been renamed to thinking_sphinx.yml.

The second file is .sphinx.conf (e.g development.sphinx.conf) and is used to interface Sphinx with the database. Whenever a fresh index is created, a fresh conf file is created which lists all the Sphinx queries based on the defined indices and connection details in database.yml.

thinking_sphinx.yml is an extension of this file where you can pass extra options to improve your search results.

Use Case

Let’s say we have an application with the following models :

  • Vendor – name, rating
  • Shop – name, description, vendor_id, location
  • Product – name, sku, description, price, shop_id

with the following model associations

  • Vendor – has_many shops
  • Shop – belongsto vendor, hasmany products
  • Product – belongs_to shop

We have the following use cases which we will cover in our article:

  • Search Vendor, Shop, Product individually
  • Search across all models
  • Search via asociation
  • Define Facets to Create filters

Basics – Definition of Indices.

Thinking Sphinx 3.0 takes a cleaner approach to the definition of indices compared to earlier versions.

To start defining an index, you first need to create a directory named ‘indices’ under the app folder.

$ mkdir indices

Let’s follow our use case and create three index files in our indices folder:

  • vendor_index.rb
  • shop_index.rb
  • product_index.rb

The index file definition includes the name of class on which need to search. It should be the same as the model class name.

The indices for attributes can be defined simply by writing :

indexes column_name

However, there are a few reserved keywords for Sphinx (e.g status). To make these columns acceptable by Sphinx, you would have to define them as a symbol.

indexes :status

According to our use case, let’s write our first index for the vendor model. This will index the vendor name and rating. We will use rating as a parameter to sort our search results for the vendor.

ThinkingSphinx::Index.define :vendor, :with => :active_record do
  indexes name
  indexes rating, :sortable => true
end

Now, while searching for products and stores, we need vendor name to be a common criteria for each search. So, if we need to search for a product called “Batman Action Figure” in the vendor “Marvel Toys”, we could search for it using the vendor name and find all the products related to it. Here is how it is done:

ThinkingSphinx::Index.define :vendor, :with => :active_record do
  indexes name, :as => vendor_name
  indexes rating, :sortable => true
end

Inside our other indexes, the associations are defined as:

shop_index.rb
ThinkingSphinx::Index.define :shop, :with => :active_record do
  indexes name, description, location

  has vendor(:name), :as => :vendor_name
end

 

product_index.rb
ThinkingSphinx::Index.define :product, :with => :active_record do
  indexes name, description
  indexes price, :sortable => true

  has shop.vendor.name, :as => :vendor_name
end

In both the files above, we have called vendor_name using associations. Shop belongs to a vendor, so we could make a direct call from vendor and call it vendor_name, whereas in product, we called it via the shop association.

Running and Generating the Index

Once, we’re done with writing the indexes in our files, we would need to generate an index and run our Sphinx server.

$ rake ts:index
$ rake ts:start

Searching

Once our Thinking Sphinx is up and running, we can write controller methods to search and display the results.

Running search on individual models looks like this:

@search_products = Product.search(params[:search], :ranker => :proximity, :match_mode => :any)

If you want to create an application wide search, you can call the ThinkingSphinx.search method to define models to be searched. You can tie this to any route and pass the search term as a parameter.

def search
   @search =  ThinkingSphinx.search(params[:search],
                             :classes => [ Vendor, Store, Product],
                             :ranker => :bm25,
                             :match_mode => :any,
                             :order => '@weight DESC',
                             :page => params[:page],
                             :per_page => 10)
end

Ranking uses different algorithms like bm25 and proximity for generating best matches within the search results.

Sorting uses an order field and can be defined to sort the results in different ways. Based on rating, alphabetical, or created_at are just some of the ways it can be sorted.

@weight is a default keyword that contains Sphinx ranking value.

Other Stuff

Field Weights :
We can rank our search results according to different fields inside a particular model:

@search_products = Product.search(params[:search],
                                 :ranker => :proximity,
                                 :match_mode => :any,
                                 :field_weights => {:description => :15,
                                                    :name => 10})

You can also write field weights inside your index files using the set_property rule like this:

:set_property :field_weights => {:description => 15,
                                 :name => 10})

Getting excerpts from the search results:

@excerpter = ThinkingSphinx::Excerpter.new 'product_core', params[:search], { :before_match => '<span class="match">',
 :after_match => '</span>',
 :chunkseparator => ' … '}

Adding Facets :

Facet-based search is used when you need to filter according to different classes or parameters. Facet definition can be done by simply defining a facet => true rule against the attribute to facet in the index file.

indexes name, :facet => true

This and then just rebuild the index.

rake ts:rebuild

In your controller, facet accepts the same parameters as search, so your facet would look like this :

@facets = ThinkingSphinx.facets(params[:search],
                                :class_facet => false,
                                :page => params[:page],
                                :per_page => 10)

If no facets are defined on any model and a facet search is written, it would fall back to class facets, calling the search on individual model classes. You can turn it off by calling false inside your facet rule, as shown above.

You can display the facet results in your view in the following way:

&lt;% @facets[:class].each do |option, count|%>
&lt;%= link_to "#{option} (#{count})", :params => { :facet => option, :page => 1}%>
&lt;%end%><br/>

Searching for partial words can be done using min_infix_len inside your thinking_sphinx.yml or inside your index file using set_property. This means it will match at the least 3 characters before declaring it a match. min_infix_len is not advisable to be kept at 1, as it could be a serious memory hog.

development:
   mem_limit: 128M
   min_infix_len: 3
 test:
   mem_limit: 128M
   min_infix</em>len: 3
 production:
   mem_limit: 128M
   min_infix_len: 3

Delta Indexes

Sphinx, by default, runs indexes from scratch everytime you run and index command. In order to avoid Sphinx starting from scratch, we can define delta indexes. This will only index the documents that are newly created.

set_property :delta => true

Conclusion

Thinking Sphinx 3.0 brings along a lot of advancements and bug fixes from previous versions. It takes a very clean approach, placing the search code outside your app models. Therefore, as queries start getting complex, the code remains readable.

Hopefully, this article has inspired you to give Thinking Sphinx a try in your application.

CSS Master, 3rd Edition