App Search with Thinking Sphinx 3.0

Thinking Sphinx is now a very standard library for interfacing with Sphinx and has come a long way in its implementation of various features of Sphinx. The new version, 3.0, is a major rewrite and quite a departure especially in terms of setup. Also, it includes fairly advanced facet searching built into it. Lastly, this version is compatible only with Rails 3.1 or above.

Installation

Sphinx installation on Linux is pretty straightforward. Downloading and compiling from source is a preferred method of Sphinx installation, despite the fact that apt and yum repositories are almost up-to-date. The newer versions of Sphinx has multi-query support with mysql, and hence it’s better to compile it with mysql development headers included.

$ wget http://sphinxsearch.com/files/sphinx-2.0.7-release.tar.gz
$ tar xvzf sphinx-2.0.7-release.tar.gz
$ cd sphinx-2.0.7-release/
$ ./configure --with-mysql
$ make
$ sudo make install

Once Sphinx is installed successfully, you can install Thinking Sphinx as a gem or bundle it in your application.

$ gem install thinking-sphinx -v "~> 3.0.2"

If mysql is your database of choice, you will have to make sure the mysql2 gem version is either locked at 0.3.12b4 or 0.3.12b5. The earlier versions of mysql2 gem will throw the following exception :

undefined method `next_result' for Mysql2

The error occurs because the previous versions of the mysql2 gem did not have multiquery support, and Sphinx now has that out-of-the-box.

Configuration

Thinking Sphinx has two main configuration files. In the earlier versions, the first file was named sphinx.yml, but has been renamed to thinking_sphinx.yml. The second file is .sphinx.conf (e.g development.sphinx.conf) and is used to interface Sphinx with the database. Whenever a fresh index is created, a fresh conf file is created which lists all the Sphinx queries based on the defined indices and connection details in database.yml. thinking_sphinx.yml is an extension of this file where you can pass extra options to improve your search results.

Use Case

Let’s say we have an application with the following models :

Vendor – name, rating
Shop – name, description, vendor_id, location
Product – name, sku, description, price, shop_id

with the following model associations

Vendor – has_many shops
Shop – belongsto vendor, hasmany products
Product – belongs_to shop

We have the following use cases which we will cover in our article:

Search Vendor, Shop, Product individually
Search across all models
Search via asociation
Define Facets to Create filters

Basics – Definition of Indices.

Thinking Sphinx 3.0 takes a cleaner approach to the definition of indices compared to earlier versions. To start defining an index, you first need to create a directory named ‘indices’ under the app folder.

$ mkdir indices

Let’s follow our use case and create three index files in our indices folder:

vendor_index.rb
shop_index.rb
product_index.rb

The index file definition includes the name of class on which need to search. It should be the same as the model class name. The indices for attributes can be defined simply by writing :

indexes column_name

However, there are a few reserved keywords for Sphinx (e.g status). To make these columns acceptable by Sphinx, you would have to define them as a symbol.

indexes :status

According to our use case, let’s write our first index for the vendor model. This will index the vendor name and rating. We will use rating as a parameter to sort our search results for the vendor.

ThinkingSphinx::Index.define :vendor, :with => :active_record do
  indexes name
  indexes rating, :sortable => true
end

Now, while searching for products and stores, we need vendor name to be a common criteria for each search. So, if we need to search for a product called “Batman Action Figure” in the vendor “Marvel Toys”, we could search for it using the vendor name and find all the products related to it. Here is how it is done:

ThinkingSphinx::Index.define :vendor, :with => :active_record do
  indexes name, :as => vendor_name
  indexes rating, :sortable => true
end

Inside our other indexes, the associations are defined as:

shop_index.rb
ThinkingSphinx::Index.define :shop, :with => :active_record do
  indexes name, description, location

  has vendor(:name), :as => :vendor_name
end

product_index.rb
ThinkingSphinx::Index.define :product, :with => :active_record do
  indexes name, description
  indexes price, :sortable => true

  has shop.vendor.name, :as => :vendor_name
end

In both the files above, we have called vendor_name using associations. Shop belongs to a vendor, so we could make a direct call from vendor and call it vendor_name, whereas in product, we called it via the shop association.

Running and Generating the Index

Once, we’re done with writing the indexes in our files, we would need to generate an index and run our Sphinx server.

$ rake ts:index
$ rake ts:start

Searching

Once our Thinking Sphinx is up and running, we can write controller methods to search and display the results. Running search on individual models looks like this:

@search_products = Product.search(params[:search], :ranker => :proximity, :match_mode => :any)

If you want to create an application wide search, you can call the ThinkingSphinx.search method to define models to be searched. You can tie this to any route and pass the search term as a parameter.

def search
   @search =  ThinkingSphinx.search(params[:search],
                             :classes => [ Vendor, Store, Product],
                             :ranker => :bm25,
                             :match_mode => :any,
                             :order => '@weight DESC',
                             :page => params[:page],
                             :per_page => 10)
end

Ranking uses different algorithms like bm25 and proximity for generating best matches within the search results. Sorting uses an order field and can be defined to sort the results in different ways. Based on rating, alphabetical, or created_at are just some of the ways it can be sorted. @weight is a default keyword that contains Sphinx ranking value.

Other Stuff

Field Weights : We can rank our search results according to different fields inside a particular model:

@search_products = Product.search(params[:search],
                                 :ranker => :proximity,
                                 :match_mode => :any,
                                 :field_weights => {:description => :15,
                                                    :name => 10})

You can also write field weights inside your index files using the set_property rule like this:

:set_property :field_weights => {:description => 15,
                                 :name => 10})

Getting excerpts from the search results:

@excerpter = ThinkingSphinx::Excerpter.new 'product_core', params[:search], { :before_match => '<span class="match">',
 :after_match => '</span>',
 :chunkseparator => ' … '}

Adding Facets : Facet-based search is used when you need to filter according to different classes or parameters. Facet definition can be done by simply defining a facet => true

rule against the attribute to facet in the index file.

indexes name, :facet => true

This and then just rebuild the index.

rake ts:rebuild

In your controller, facet accepts the same parameters as search, so your facet would look like this :

@facets = ThinkingSphinx.facets(params[:search],
                                :class_facet => false,
                                :page => params[:page],
                                :per_page => 10)

If no facets are defined on any model and a facet search is written, it would fall back to class facets, calling the search on individual model classes. You can turn it off by calling false inside your facet rule, as shown above. You can display the facet results in your view in the following way:

&lt;% @facets[:class].each do |option, count|%>
&lt;%= link_to "#{option} (#{count})", :params => { :facet => option, :page => 1}%>
&lt;%end%><br/>

Searching for partial words can be done using min_infix_len inside your thinking_sphinx.yml or inside your index file using set_property. This means it will match at the least 3 characters before declaring it a match. min_infix_len is not advisable to be kept at 1, as it could be a serious memory hog.

development:
   mem_limit: 128M
   min_infix_len: 3
 test:
   mem_limit: 128M
   min_infix</em>len: 3
 production:
   mem_limit: 128M
   min_infix_len: 3

Delta Indexes

Sphinx, by default, runs indexes from scratch everytime you run and index command. In order to avoid Sphinx starting from scratch, we can define delta indexes. This will only index the documents that are newly created.

set_property :delta => true

Conclusion

Thinking Sphinx 3.0 brings along a lot of advancements and bug fixes from previous versions. It takes a very clean approach, placing the search code outside your app models. Therefore, as queries start getting complex, the code remains readable. Hopefully, this article has inspired you to give Thinking Sphinx a try in your application.

Frequently Asked Questions (FAQs) about App Search with Thinking Sphinx 3.0

How does Thinking Sphinx 3.0 differ from previous versions?

Thinking Sphinx 3.0 introduces several new features and improvements over its predecessors. It has a more streamlined setup process, improved indexing speed, and better support for non-SQL data sources. It also introduces a new syntax for defining indexes, which is more flexible and powerful than before. However, it’s important to note that some features from previous versions have been deprecated or removed in Thinking Sphinx 3.0, so you may need to update your code if you’re upgrading from an older version.

Can I use Thinking Sphinx with non-SQL data sources?

Yes, Thinking Sphinx 3.0 introduces support for non-SQL data sources. This means you can use it to index and search data stored in NoSQL databases, file systems, APIs, and other non-relational data sources. However, the setup process for non-SQL data sources is a bit more complex than for SQL databases, and you may need to write custom code to handle data extraction and indexing.

How can I improve the performance of my Sphinx search queries?

There are several ways to optimize Sphinx search queries for better performance. One of the most effective methods is to use Sphinx’s built-in query optimization features, such as query expansion, keyword stemming, and stopword lists. You can also improve performance by optimizing your index structure, using more efficient data types, and tuning your Sphinx configuration settings.

What are the limitations of Thinking Sphinx?

While Thinking Sphinx is a powerful search tool, it does have some limitations. For example, it doesn’t support real-time indexing out of the box, so you’ll need to re-index your data whenever it changes. It also doesn’t support some advanced search features, such as faceted search and multi-field search, without additional configuration or custom code.

How can I troubleshoot problems with Thinking Sphinx?

If you’re having trouble with Thinking Sphinx, there are several steps you can take to diagnose and fix the problem. First, check the Sphinx log files for any error messages or warnings. You can also use the Sphinx command-line tools to test your configuration and query your indexes directly. If you’re still having trouble, you can seek help from the Sphinx community on forums, mailing lists, and other online resources.

How do I upgrade from an older version of Thinking Sphinx to version 3.0?

Upgrading from an older version of Thinking Sphinx to version 3.0 involves several steps. First, you’ll need to update your Gemfile to use the new version of the gem. Then, you’ll need to update your index definitions to use the new syntax introduced in version 3.0. Finally, you’ll need to re-index your data using the new version of Sphinx.

Can I use Thinking Sphinx with other programming languages besides Ruby?

Thinking Sphinx is primarily designed to work with Ruby and Ruby on Rails applications. However, it’s possible to use it with other programming languages by using the Sphinx API or SphinxQL, which are language-agnostic interfaces to the Sphinx search engine.

How do I configure Thinking Sphinx for a multi-tenant application?

Configuring Thinking Sphinx for a multi-tenant application involves setting up separate indexes for each tenant and ensuring that search queries only return results from the correct tenant’s index. This can be achieved by using Sphinx’s support for index prefixes or by using a multi-core setup where each tenant has its own Sphinx instance.

How do I handle complex search queries with Thinking Sphinx?

Handling complex search queries with Thinking Sphinx involves using Sphinx’s advanced query syntax, which supports boolean operators, phrase matching, proximity search, and other advanced features. You can also use Sphinx’s support for filters and sorting to further refine your search results.

How do I index and search non-text data with Thinking Sphinx?

Indexing and searching non-text data with Thinking Sphinx involves using Sphinx’s support for attribute indexing. Attributes are non-text data associated with your documents, such as dates, numbers, or geographical coordinates. You can index these attributes and use them in your search queries to filter or sort your results.