Sql Server Full-Text Search Protips Part 3: Getting RANKed

In Part 1 of this series we explored the methodology for enabling a Sql Server database for Full-Text Indexing and enabling Full-Text indices on specific tables. And in Part 2 of this series we explored the methodology for accessing these Full-Text indices using the CONTAINS and FREETEXT functions. In this final part of the series we will examine the usage of the CONTAINSTABLE and FREETEXTTABLE functions to provide ranked search results to the calling application.

As previously mentioned, the CONTAINSTABLE and FREETEXTTABLE functions query the selected Full-Text index for the provided search terms. The difference between these functions and their non-TABLE cousins—that is CONTAINS and FREETEXT—is that these functions take slightly different parameters and, most important, they return a table of values rather than a simple true/false answer.

These table functions take a slightly different set of parameters than their cousins. These parameters are:

  1. Free-Text indexed table to query.
  2. Columns within the table to query.
  3. Search term. For FREETEXTTABLE, this would generally just be the term(s) you wish to find. For CONTAINSTABLE, this term could be anything like the ones described in Part 2.
  4. Top_N_Results. An optional integer to set a limit to the number of results to get. This field is very important—limiting to the amount of results you actually wish to work with greatly lessens load on the database server, especially when dealing with large result sets.

The table these functions return contains two columns:

  1. KEY: this contains the primary key of the row in the indexed table. Obviously, but importantly, its data type is the same as the primary key in that table.
  2. RANK: this is a positive integer representing the relative strength of the match to your search term. This number is not an absolute value—it does not start at some fixed value and work down to near zero. The specific value is only relative compared to other results from that Free-Text query. Comparing the RANK value across different Full-Text queries does not necessarily get meaningful results.

Now, that two-field table might sound very simple, but with it you can do a lot of very fancy tricks using a very simple technique—just INNER JOIN the table returned from your Free-Text function on the KEY field to your desired results, then sort by the RANK column for sorting effects. Or even get those neat percentage matches you see on some site search engines.

For the following examples, we will be using the Microsoft Pubs database, like we setup in Part 2 of this series.

First, let’s examine getting ranked search results from using FREETEXTTABLE:


SELECT 
	ftt.RANK,
	titles.title_id, 
	titles.title
FROM Titles
INNER JOIN 
FREETEXTTABLE(titles, notes, 'recipe cuisine') as ftt
ON
ftt.[KEY]=titles.title_id
ORDER BY ftt.RANK DESC

Nothing too tricky there, as you can see—just a simple INNER JOIN to the FREETEXTTABLE and ORDER BY the FREETEXTTABLE’s rank.

Now, to get percentages, you need to get a little fancier. For a match percentage, you need to divide the RANK by the top RANK value. You cannot, unfortunately, do this with just one Free-Text query; it requires selecting the TOP value then selecting the desired results and creating a calculated field for the percentages. Confused yet? Well, let’s see it in SQL.


DECLARE @topRank int

set @topRank=(SELECT MAX(RANK) FROM
FREETEXTTABLE(titles, notes, 'recipe cuisine', 1))

SELECT 
	ftt.RANK, 
	(CAST(ftt.RANK as DECIMAL)/@topRank) as matchpercent, 
	titles.title_id, 
	titles.title
FROM Titles
INNER JOIN 
FREETEXTTABLE(titles, notes, 'recipe cuisine') as ftt
ON
ftt.[KEY]=titles.title_id
ORDER BY ftt.RANK DESC

One note—I very intentionally used the optional Top_N_Results parameter on the first FREETEXTTABLE call to limit the number of results returned. The reason for this is to make the query a lot less expensive on the database server.

One final key trick: as I mentioned in Part 2, one can get very fancy with CONTAINS and search terms. This really, really applies to CONTAINSTABLE. For example, using weighted search terms plus the RANK column lets one get the right results for end users. For example, let’s say we wanted to find books on recipies or cuisies, but especially recipies:


SELECT 
	ftt.RANK,
	titles.title_id, 
	titles.title,
	notes
FROM Titles
INNER JOIN 
CONTAINSTABLE
	(
	titles, 
	notes, 
	'ISABOUT("recipe*" weight (.7), "cuisine*" weight (.2))'
	) 
as ftt
ON
ftt.[KEY]=titles.title_id
ORDER BY ftt.RANK DESC

One thing to note in the above example is that, because we are using a CONTAINS-type query, the search term (ISABOUT . . .) needed to include other forms to catch plurals and such. In this cases I used the “SearchTerm*” construct to tell the Full-Text engine to find all terms starting with ‘SearchTerm’.

Finally, to review:

  1. In Part 1 we examined how to create Full-Text indices and considerations for maintaining those indexes.
  2. In Part 2 we examined how to query your Full-Text indices using the CONTAINS and FREETEXT predicates, and the difference between those functions.
  3. In this part, we looked at using the CONTAINSTABLE and FREETEXTTABLE functions to get ranked search results from your Free-Text indices. In addition, we examined some limitations of and workarounds for using Free-Text queries.

 

I hope you all have enjoyed, and please do post any comments or questions. And finally, feel free to kick it if you like it.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Nick

    Sounds like familiar, cuz I just talked with my boss that how we should do an intellect search which users concern most, i.e. how to weigh the searching words to conduct a meaningful search. Boom, I see this article, exactly what I was thinking, – thx someone is there waiting to help me. Hooray, Wyatt Barnett!