Sql Server Full-Text Search Protips Part 2: CONTAINS vs. FREETEXTBy Wyatt Barnett
In Part 1 of this series we explored the methodology for enabling a Sql Server database for Full-Text Indexing and enabling Full-Text indicies on specific tables. But now we get to the good part—how to use that index in your application.
There are four principal T-SQL functions which allow one to interact with your Full-Text indices: CONTAINS, FREETEXT, CONTAINSTABLE and FREETEXTTABLE. The first two return a boolean value, meaning you could use them directly in a WHERE clause. The latter two return a two-column table—KEY and RANK, allowing one to manage ranked searches.
Today we shall explore CONTAINS and FREETEXT; CONTAINSTABLE and FREETEXTTABLE will be enumerated at a later date. In any case, the big thing to remember is that CONTAINS is far more specific and much more configurable than FREETEXT. The former searches using the parameters it is told, whereas the latter will split words, find inflectional forms and perform other magics. Obviously, FREETEXT is the more expensive of the two, and oftentimes it can be overkill.
Confused yet? How about some examples? For this we are going to use the venerable PUBS sample database. (If you do not have a copy, you can download the database script from Microsoft.) It does have any Full-Text features setup, but we are going to run the following SQL on it:
--Enable database for full-text querying exec sp_fulltext_database 'enable' --Create catalog exec sp_fulltext_catalog 'PubsCatalog', 'create' --Enable titles table for fulltext indexing exec sp_fulltext_table 'titles', 'create', 'PubsCatalog', 'UPKCL_titleidind' --Add columns exec sp_fulltext_column 'titles', 'title', 'add' exec sp_fulltext_column 'titles', 'notes', 'add' --start change tracking to force indexing exec sp_fulltext_table 'titles', 'start_change_tracking'
The above just sets up the database for Full-Text indexing and enables it on some choice columns. For a full explanation of setting up Full-Text indexing, please see Part 1.
CONTAINS vs FREETEXT
CONTAINS and FREETEXT are quite similar functions. Both return a boolean value, and both take 2 parameters: a Free-Text indexed column name and the Free-Text search term. But they behave quite differently. For a clear example, run the following queries:
SELECT title_id, title FROM Titles WHERE CONTAINS(notes, 'recipe') SELECT title_id, title FROM Titles WHERE FREETEXT(notes, 'recipe')
Note that the first SELECT statement returns zero rows, whereas the 2nd SELECT statement returns three rows. Why, you ask? Because CONTAINS is specific. The notes field contains the term ‘recipies’ but NOT ‘recipie’. FREETEXT automagically adds inflectional forms of a word to the search, resulting a looser resultset.
You might now be saying “eh, that CONTAINS predicate is pretty useless.” But that is far from the truth. There is an entire Free-Text query syntax to allow you to achieve some very slick searches using CONTAINS. First, one can intentionally force it to find all word forms using FORMSOF. Extending the above example to:
SELECT title_id, title FROM Titles WHERE CONTAINS(notes, 'FORMSOF(INFLECTIONAL, recipe)')
Results in three rows returned, just like FREETEXT example. Another useful technique is looking for multiple terms. Let’s say you wanted to capture books about cuisine and/or recipies. You could modify the query to be:
SELECT title_id, title FROM Titles WHERE CONTAINS(notes, 'FORMSOF(INFLECTIONAL, recipe) or FORMSOF(INFLECTIONAL, cuisine)')
This query should return four rows, adding a book with cuisine in the list of query results.
The above examples barely scratch the surface—one can get very, very fancy with CONTAINS search clauses in order to get specific and tailored sets of search results. The developer can specify weight of each search term to get more specific results. Or ask for words NEAR one another. With a proper parser you can craft some very specific results from a rather open-ended search box. There is, however, one crucial element CONTAINS and FREETEXT lack for crafting these results: some sort of outward ranking of the value of the search results. Enter the two Full-Text TABLE functions—CONTAINSTABLE and FREETEXTTABLE. For more information on using those T-SQL functions stay tuned for Part 3 of this series.