Sql Server Full-Text Search Protips Part 3: Getting RANKed
In Part 1 of this series we explored the methodology for enabling a Sql Server database for Full-Text Indexing and enabling Full-Text indices on specific tables. And in Part 2 of this series we explored the methodology for accessing these Full-Text indices using the CONTAINS and FREETEXT functions. In this final part of the series we will examine the usage of the CONTAINSTABLE and FREETEXTTABLE functions to provide ranked search results to the calling application.
As previously mentioned, the CONTAINSTABLE and FREETEXTTABLE functions query the selected Full-Text index for the provided search terms. The difference between these functions and their non-TABLE cousins—that is CONTAINS and FREETEXT—is that these functions take slightly different parameters and, most important, they return a table of values rather than a simple true/false answer.
These table functions take a slightly different set of parameters than their cousins. These parameters are:
- Free-Text indexed table to query.
- Columns within the table to query.
- Search term. For FREETEXTTABLE, this would generally just be the term(s) you wish to find. For CONTAINSTABLE, this term could be anything like the ones described in Part 2.
- Top_N_Results. An optional integer to set a limit to the number of results to get. This field is very important—limiting to the amount of results you actually wish to work with greatly lessens load on the database server, especially when dealing with large result sets.
The table these functions return contains two columns:
- KEY: this contains the primary key of the row in the indexed table. Obviously, but importantly, its data type is the same as the primary key in that table.
- RANK: this is a positive integer representing the relative strength of the match to your search term. This number is not an absolute value—it does not start at some fixed value and work down to near zero. The specific value is only relative compared to other results from that Free-Text query. Comparing the RANK value across different Full-Text queries does not necessarily get meaningful results.
Now, that two-field table might sound very simple, but with it you can do a lot of very fancy tricks using a very simple technique—just INNER JOIN the table returned from your Free-Text function on the KEY field to your desired results, then sort by the RANK column for sorting effects. Or even get those neat percentage matches you see on some site search engines.
For the following examples, we will be using the Microsoft Pubs database, like we setup in Part 2 of this series.
First, let’s examine getting ranked search results from using FREETEXTTABLE:
SELECT ftt.RANK, titles.title_id, titles.title FROM Titles INNER JOIN FREETEXTTABLE(titles, notes, 'recipe cuisine') as ftt ON ftt.[KEY]=titles.title_id ORDER BY ftt.RANK DESC
Nothing too tricky there, as you can see—just a simple INNER JOIN to the FREETEXTTABLE and ORDER BY the FREETEXTTABLE’s rank.
Now, to get percentages, you need to get a little fancier. For a match percentage, you need to divide the RANK by the top RANK value. You cannot, unfortunately, do this with just one Free-Text query; it requires selecting the TOP value then selecting the desired results and creating a calculated field for the percentages. Confused yet? Well, let’s see it in SQL.
DECLARE @topRank int set @topRank=(SELECT MAX(RANK) FROM FREETEXTTABLE(titles, notes, 'recipe cuisine', 1)) SELECT ftt.RANK, (CAST(ftt.RANK as DECIMAL)/@topRank) as matchpercent, titles.title_id, titles.title FROM Titles INNER JOIN FREETEXTTABLE(titles, notes, 'recipe cuisine') as ftt ON ftt.[KEY]=titles.title_id ORDER BY ftt.RANK DESC
One note—I very intentionally used the optional Top_N_Results parameter on the first FREETEXTTABLE call to limit the number of results returned. The reason for this is to make the query a lot less expensive on the database server.
One final key trick: as I mentioned in Part 2, one can get very fancy with CONTAINS and search terms. This really, really applies to CONTAINSTABLE. For example, using weighted search terms plus the RANK column lets one get the right results for end users. For example, let’s say we wanted to find books on recipies or cuisies, but especially recipies:
SELECT ftt.RANK, titles.title_id, titles.title, notes FROM Titles INNER JOIN CONTAINSTABLE ( titles, notes, 'ISABOUT("recipe*" weight (.7), "cuisine*" weight (.2))' ) as ftt ON ftt.[KEY]=titles.title_id ORDER BY ftt.RANK DESC
One thing to note in the above example is that, because we are using a CONTAINS-type query, the search term (ISABOUT . . .) needed to include other forms to catch plurals and such. In this cases I used the “SearchTerm*” construct to tell the Full-Text engine to find all terms starting with ‘SearchTerm’.
Finally, to review:
- In Part 1 we examined how to create Full-Text indices and considerations for maintaining those indexes.
- In Part 2 we examined how to query your Full-Text indices using the CONTAINS and FREETEXT predicates, and the difference between those functions.
- In this part, we looked at using the CONTAINSTABLE and FREETEXTTABLE functions to get ranked search results from your Free-Text indices. In addition, we examined some limitations of and workarounds for using Free-Text queries.
I hope you all have enjoyed, and please do post any comments or questions. And finally, feel free to kick it if you like it.