Sql Server Full-Text Search Protips Part 2: CONTAINS vs. FREETEXT

Tweet

In Part 1 of this series we explored the methodology for enabling a Sql Server database for Full-Text Indexing and enabling Full-Text indicies on specific tables. But now we get to the good part—how to use that index in your application.

There are four principal T-SQL functions which allow one to interact with your Full-Text indices: CONTAINS, FREETEXT, CONTAINSTABLE and FREETEXTTABLE. The first two return a boolean value, meaning you could use them directly in a WHERE clause. The latter two return a two-column table—KEY and RANK, allowing one to manage ranked searches.

Today we shall explore CONTAINS and FREETEXT; CONTAINSTABLE and FREETEXTTABLE will be enumerated at a later date. In any case, the big thing to remember is that CONTAINS is far more specific and much more configurable than FREETEXT. The former searches using the parameters it is told, whereas the latter will split words, find inflectional forms and perform other magics. Obviously, FREETEXT is the more expensive of the two, and oftentimes it can be overkill.

Confused yet? How about some examples? For this we are going to use the venerable PUBS sample database. (If you do not have a copy, you can download the database script from Microsoft.) It does have any Full-Text features setup, but we are going to run the following SQL on it:


--Enable database for full-text querying
exec sp_fulltext_database 'enable'
--Create catalog
exec sp_fulltext_catalog 'PubsCatalog', 'create'
--Enable titles table for fulltext indexing
exec sp_fulltext_table 'titles', 'create', 'PubsCatalog', 'UPKCL_titleidind'
--Add columns
exec sp_fulltext_column 'titles', 'title', 'add'
exec sp_fulltext_column 'titles', 'notes', 'add'
--start change tracking to force indexing
exec sp_fulltext_table 'titles', 'start_change_tracking'

The above just sets up the database for Full-Text indexing and enables it on some choice columns. For a full explanation of setting up Full-Text indexing, please see Part 1.

CONTAINS vs FREETEXT

CONTAINS and FREETEXT are quite similar functions. Both return a boolean value, and both take 2 parameters: a Free-Text indexed column name and the Free-Text search term. But they behave quite differently. For a clear example, run the following queries:


SELECT title_id, title FROM Titles
WHERE CONTAINS(notes, 'recipe')

SELECT title_id, title FROM Titles
WHERE FREETEXT(notes, 'recipe')

Note that the first SELECT statement returns zero rows, whereas the 2nd SELECT statement returns three rows. Why, you ask? Because CONTAINS is specific. The notes field contains the term ‘recipies’ but NOT ‘recipie’. FREETEXT automagically adds inflectional forms of a word to the search, resulting a looser resultset.

Extending CONTAINS

You might now be saying “eh, that CONTAINS predicate is pretty useless.” But that is far from the truth. There is an entire Free-Text query syntax to allow you to achieve some very slick searches using CONTAINS. First, one can intentionally force it to find all word forms using FORMSOF. Extending the above example to:


SELECT title_id, title FROM Titles
WHERE CONTAINS(notes, 'FORMSOF(INFLECTIONAL, recipe)')

Results in three rows returned, just like FREETEXT example. Another useful technique is looking for multiple terms. Let’s say you wanted to capture books about cuisine and/or recipies. You could modify the query to be:


SELECT title_id, title FROM Titles
WHERE CONTAINS(notes, 'FORMSOF(INFLECTIONAL, recipe) or FORMSOF(INFLECTIONAL, cuisine)')

This query should return four rows, adding a book with cuisine in the list of query results.

The above examples barely scratch the surface—one can get very, very fancy with CONTAINS search clauses in order to get specific and tailored sets of search results. The developer can specify weight of each search term to get more specific results. Or ask for words NEAR one another. With a proper parser you can craft some very specific results from a rather open-ended search box. There is, however, one crucial element CONTAINS and FREETEXT lack for crafting these results: some sort of outward ranking of the value of the search results. Enter the two Full-Text TABLE functions—CONTAINSTABLE and FREETEXTTABLE. For more information on using those T-SQL functions stay tuned for Part 3 of this series.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • vali

    I’m using a JavaScript PARSER to format the string for CONTAINS..
    It’s not good when dealing with UTF-8 (start/stop word boundary.. problems..)
    Do U know something better?

  • Chris Lu

    Please take a look at this

    http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

    You can create a full-text database search service, return results as HTML/XML/JSON. It uses the Lucene directly in java, but can be easily used with Ruby, PHP, or any existing database web applicatoins.

    You can easily index, re-index, incremental-index. It’s also highly scalable and easily customizable.

    The best thing is, it’s super easy. You can create a production-level search in 3 minutes.

  • Damen

    Why doesn’t Chris Lu, go away and use Google ad words instead of spamming every single article on the net to do with SQL FTS?

    Am I the only one who gets annoyed by people trying to peddle their wares in this fashion?

  • Anu

    I am using contains to get results from a table that has a column which has values with single quotes. For example last names like O’Niel, O’Brien, ch’en etc.

    select * from employees where contain(lastname, ‘”ch”en*”‘) – does not return any results
    where as
    select * from employees where contain(lastname, ‘”ch”e*”‘) returns 1 row- as expected
    and
    select * from employees where contain(lastname, ‘”ch”en”*”‘) returns 1 row

    This looks like a mystery to me. Could someone explain the difference between the above queries?

    Many thanks,

    Anu