PostgreSQL mostly known as Postgres, is one of the most mature and robust databases out there. It’s a multi-platform, open source and the second most used open source DBMS.
Today we will see about how to implement a basic autocomplete search using a trigram in Postgres and Rails. The tutorial is split into three parts:
- What is a trigram?
- Trigram in Postgres
- Implementing trigram in a sample rails app
Key Takeaways
- Trigram search in Rails and PostgreSQL is a powerful tool for autocomplete functionality. It operates by breaking down words into three-letter sequences and comparing the similarity between these sequences to find matching words. The more matching sequences, the higher the similarity between the words.
- PostgreSQL supports trigrams through an extension called pg_trgm. This extension provides several features that aid in trigram searches, such as calculating the similarity index between two texts, listing possible trigrams from a given text, and setting the similarity index limit for trigram searches.
- Implementing a trigram search in a Rails application involves creating a model, seeding the database with some data, and installing the pg_trgm extension. A search index can also be added to speed up queries. The search method in the model utilizes the similarity function from the pg_trgm extension to perform the trigram search.
- The pg_search gem provides trigram search functionality out of the box, but may be slower than raw SQL. Other optimization methods such as having a separate column for trigram sequences or using ts_vector can also be used, although the latter may not be effective with fuzzy words.
What is a Trigram?
A trigram is nothing but an n-gram with three letter sequence. So, what is an n-gram? From wikipedia,
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech.
Well, what does that mean, exactly? It means finding matching words by maintaining variable sequences of characters in a word.
In a trigram the variable sequence in n-gram is 3. For finding the similarity between two words, wordA and wordB, wordA is split into three letter sequences and compared with the three letter sequence combinations computed from wordB. The comparison aims to find the number of shared sets between the two words. The more number of sequence matches means the high similarity between the words. This becomes very useful in terms of autocompletion.
Each word is treated with two spaces prefixed and one space suffixed to balance the number of trigrams for a n-character word. That’s confusing, so let’s have an example.
Assume we have a word group that consists of three words [google, toddle, beagle]
and the search term is googlr. We need to find the best matching word from the batch for the search term. First the batch words are split with three letter groups:
google - g, go, goo, oog, ogl, gle, le
toddle - t, to, tod, odd, ddl, dle, le
beagle - b, be, bea, eag, agl, gle, le
The trigram groups of three letters will be calculated for the search term and compared to the words in for the batch for the sequences they share:
g, go, goo, oog, ogl, glr, lr
google - 5
toddle - 0
beagle - 0
The similarity is calculated using the number of trigrams they share, which in our case is quite trivial. So google
will top the similarity index.
For the second use case, let’s say the search term is just gle
. The trigram is:
g, gl, gle, le
Matches -
google - 3
toddle - 1
beagle - 2
For this, google
will top the result followed by beagle
. Once you are comfortable with this concept, we can move on to how trigrams are implemented in Postgres.
Trigram in PostgreSQL
Postgres supports trigrams through an extension called pg_trgm which is an officially supported extension. It’s worth noting that pgtrgm ignores special characters while calculating trigram.
The following list consists of the features that the extension comes with, which helps in doing trigram searches:
similarity(text1, text2)
– Calculates the similarity index betweentext1
andtext2
in the scale of 0 to 1, with 0 being the least similar.show_trgm(text)
– Lists the possible trigrams that could be calculated from the giventext
, like we did above.show_limit()
– The set filter used by the%
method (look below). Similarity indices above this limit are only shown while performing a trigram search. The default limit is 0.3.set_limit(real)
– Sets the limit to be used by%
method.text1 % text2
– Returns true if the similarity betweentext1
andtext2
is above the set limit.text1 <-> text2
– Distance operator, which is the inverse of similarity. Returns the distance betweentext1
andtext2
.gist\_trgm\_ops
andgin\_trgm\_ops
– Builds the GiST or GIN index, respectively, over a text column for faster similarity search.
Let’s get started with implementing a trigram search in a Rails app.
Implementing a Trigram in Rails
Our sample app is going to be very simple with only one model, Post
, which has two columns title
and content
. Let’s quickly create the app, model, and controller using the commands below. I am using Rails 4.2:
rails new app_name -d postgresql
cd app_name
rails generate model post title content
rake db:create && rake db:migrate
rails generate controller posts index
Seed the database with some fake data. I’m using the Faker gem. Below is the seed file:
(0..100).each do |p|
Post.create(title: Faker::Commerce.product_name, content: Faker::Company.catch_phrase)
puts "Created #{p}"
end
Let’s also add some basic content to the controller:
def index
if params[:q].present?
@posts = Post.where(title: params[:q])
else
@posts = Post.all
end
end
In the app/views/post/index.html.erb file, add the below lines, which includes a basic search box along with the list of all the posts:
<form method="GET" action="/">
<input placeholder="Search" id="search" type="text" name="q" />
<input type="submit">
</form>
<table>
<tr>
<th>Title</th>
<th>Content</th>
</tr>
<% @posts.each do |post| %>
<tr>
<td><%= post.title %></td>
<td><%= post.content %></td>
</tr>
<% end %>
</table>
We now have a basic application with a single model, 100 rows of posts, an index page, and a search option that matches only the full title of the post. Let’s plug a trigram search into it.
Install the pg_trgm Extension
As mentioned before, Postgres provides trigram functionality via the pg_trgm
extension . Install it in the app using a migration instead of doing it directly in the psql console. Create a migration using the below command:
rails generate migration AddPgTrgmExtensionToDB
Add the following to the migration:
execute "create extension pg_trgm;"
Run the migration. This will install the pg_trgm extension in Postgres.
Add a Search Index
When we’re at it, let’s also add an index to the column that we are going to search. GiST (Generalized Search Tree) and GIN (Generalized Inverted Index) are two kinds of indices in Postgres. Adding an index is not mandatory, but desirable to speed up queries. At this point, I really can’t recommend GiST or GIN since I’ve had varying performance differences between them in the past. Primary differences between the two indexes and how to choose one can be found here. Add whichever works for you best.
Create a migration and add the below line to it:
add_index :posts, :title
Run the migration and that’s it! We’re all set on the database side. Quickly add the search query to make use of the trigram similarity.
Search Method
To add the search option, add a method in our Post
model:
class Post < ActiveRecord::Base
def self.text_search(query)
self.where("similarity(title, ?) > 0.3", query).order("similarity(title, #{ActiveRecord::Base.connection.quote(query)}) DESC")
end
end
Let’s also replace the search line in our controller from
@posts = Post.where(title: params[:q])
to
@posts = Post.text_search(params[:q])
That’s it. Start the server, then go and search with typos and see the magic of similar words showing up.
In the text_search
method in post.rb, the threshold score is set to 0.3
. Feel free to tune this to meet your requirements. The more the threshold, the less results and the stricter the searches.
One way of improving the speed of the search is by having a separate column that will hold all the trigram sequences of the title column. Then, we can perform the search against the pre-populated column. Or we can make use of ts_vector
, but that would become useless with fuzzy words.
Conclusion
There is a gem called pg_search that provides the trigram search functionality out of the box, but for some reason the trigram search from this gem is slower for me than the raw SQL. I may cover this gem in the future.
All the code used for sample in this is hosted on GitHub. Feel free to fork an play with it.
Thank you for reading. I hope you find this useful in your Rails application development.
Frequently Asked Questions (FAQs) about Autocomplete Trigram Search in Rails and PostgreSQL
How does trigram search work in PostgreSQL?
Trigram search in PostgreSQL works by breaking down strings into trigrams, which are groups of three consecutive characters. The PostgreSQL extension pg_trgm provides functions and operators for determining the similarity of alphanumeric text based on trigram matching. It uses the concept of “distance” to measure the similarity between two strings. The smaller the distance, the more similar the strings are. This feature is particularly useful in implementing autocomplete functionality where you want to return results that closely match the user’s input.
How can I optimize my autocomplete search in PostgreSQL?
Optimizing your autocomplete search in PostgreSQL can be achieved by creating an index on the column you’re searching. This can significantly speed up search queries. Additionally, you can use the pg_trgm module for trigram-based searches, which is more efficient for autocomplete functionality. It’s also important to regularly analyze your database to keep the statistics up to date, which helps the PostgreSQL query planner make better decisions.
What are the benefits of using Rails with PostgreSQL for autocomplete functionality?
Rails and PostgreSQL make a powerful combination for implementing autocomplete functionality. Rails provides a robust and flexible framework for building web applications, while PostgreSQL offers advanced features like trigram search. Together, they allow you to create efficient, scalable, and user-friendly autocomplete features. Moreover, both Rails and PostgreSQL are open-source, which means they are continually updated and improved by a community of developers.
How can I implement autocomplete with Postgres and Python?
While this article focuses on implementing autocomplete with Rails and PostgreSQL, you can also achieve this with Python and PostgreSQL. You would still use the pg_trgm module for trigram-based searches. However, instead of Rails, you would use a Python web framework like Django or Flask. The implementation details might differ, but the underlying principles remain the same.
How does full-text search differ from trigram search?
Full-text search and trigram search are both powerful search techniques, but they serve different purposes. Full-text search is designed to find documents or records that contain specific words or phrases. It’s ideal for searching large volumes of text data. On the other hand, trigram search is designed to find records that are similar to a given string. It’s particularly useful for implementing autocomplete functionality where you want to return results that closely match the user’s input.
Can I use the autocomplete feature in the PostgreSQL query tool?
Yes, the PostgreSQL query tool does provide an autocomplete feature. However, this is different from implementing autocomplete functionality in a web application. The autocomplete feature in the PostgreSQL query tool is designed to help you write SQL queries by suggesting table names, column names, and other SQL keywords as you type.
What is the fastest index for autocomplete in PostgreSQL?
The fastest index for autocomplete in PostgreSQL is typically a GIN (Generalized Inverted Index) index on the column you’re searching, combined with the use of the pg_trgm module for trigram-based searches. This setup allows PostgreSQL to quickly find records that closely match the user’s input, making it ideal for autocomplete functionality.
How can I build full-text search across all my tools with Postgres?
Building full-text search across all your tools with Postgres involves creating a full-text search index on the relevant columns in your database. You can then use the full-text search functions provided by Postgres to query this index. However, keep in mind that full-text search is designed to find specific words or phrases, not to find records that are similar to a given string.
How can I implement autocomplete using Postgres optimization?
Implementing autocomplete using Postgres optimization involves using the pg_trgm module for trigram-based searches and creating an index on the column you’re searching. This setup allows Postgres to quickly find records that closely match the user’s input. Additionally, you should regularly analyze your database to keep the statistics up to date, which helps the Postgres query planner make better decisions.
What are some common challenges when implementing autocomplete functionality and how can I overcome them?
Some common challenges when implementing autocomplete functionality include handling large volumes of data, ensuring fast response times, and dealing with typos or variations in user input. Using a trigram search with PostgreSQL can help address these challenges. Trigram search is efficient even with large volumes of data, and it can find records that closely match the user’s input, even if there are minor typos or variations.
Vinoth is a Server Administrator turned Full stack web developer. He loves to try his hands on multiple programming languages but his primary programming language of choice is Ruby. He is currently a Software Engineer @ Intelllex building the server side of things. You can find more about him at avinoth.com.