Sql Server Full-Text Search Protips Part 2: CONTAINS vs. FREETEXT

Wyatt Barnett
Wyatt Barnett
Share

In Part 1 of this series we explored the methodology for enabling a Sql Server database for Full-Text Indexing and enabling Full-Text indicies on specific tables. But now we get to the good part—how to use that index in your application.

There are four principal T-SQL functions which allow one to interact with your Full-Text indices: CONTAINS, FREETEXT, CONTAINSTABLE and FREETEXTTABLE. The first two return a boolean value, meaning you could use them directly in a WHERE clause. The latter two return a two-column table—KEY and RANK, allowing one to manage ranked searches.

Today we shall explore CONTAINS and FREETEXT; CONTAINSTABLE and FREETEXTTABLE will be enumerated at a later date. In any case, the big thing to remember is that CONTAINS is far more specific and much more configurable than FREETEXT. The former searches using the parameters it is told, whereas the latter will split words, find inflectional forms and perform other magics. Obviously, FREETEXT is the more expensive of the two, and oftentimes it can be overkill.

Confused yet? How about some examples? For this we are going to use the venerable PUBS sample database. (If you do not have a copy, you can download the database script from Microsoft.) It does have any Full-Text features setup, but we are going to run the following SQL on it:


--Enable database for full-text querying
exec sp_fulltext_database 'enable'
--Create catalog
exec sp_fulltext_catalog 'PubsCatalog', 'create'
--Enable titles table for fulltext indexing
exec sp_fulltext_table 'titles', 'create', 'PubsCatalog', 'UPKCL_titleidind'
--Add columns
exec sp_fulltext_column 'titles', 'title', 'add'
exec sp_fulltext_column 'titles', 'notes', 'add'
--start change tracking to force indexing
exec sp_fulltext_table 'titles', 'start_change_tracking'

The above just sets up the database for Full-Text indexing and enables it on some choice columns. For a full explanation of setting up Full-Text indexing, please see Part 1.

CONTAINS vs FREETEXT

CONTAINS and FREETEXT are quite similar functions. Both return a boolean value, and both take 2 parameters: a Free-Text indexed column name and the Free-Text search term. But they behave quite differently. For a clear example, run the following queries:


SELECT title_id, title FROM Titles
WHERE CONTAINS(notes, 'recipe')

SELECT title_id, title FROM Titles
WHERE FREETEXT(notes, 'recipe')

Note that the first SELECT statement returns zero rows, whereas the 2nd SELECT statement returns three rows. Why, you ask? Because CONTAINS is specific. The notes field contains the term ‘recipies’ but NOT ‘recipie’. FREETEXT automagically adds inflectional forms of a word to the search, resulting a looser resultset.

Extending CONTAINS

You might now be saying “eh, that CONTAINS predicate is pretty useless.” But that is far from the truth. There is an entire Free-Text query syntax to allow you to achieve some very slick searches using CONTAINS. First, one can intentionally force it to find all word forms using FORMSOF. Extending the above example to:


SELECT title_id, title FROM Titles
WHERE CONTAINS(notes, 'FORMSOF(INFLECTIONAL, recipe)')

Results in three rows returned, just like FREETEXT example. Another useful technique is looking for multiple terms. Let’s say you wanted to capture books about cuisine and/or recipies. You could modify the query to be:


SELECT title_id, title FROM Titles
WHERE CONTAINS(notes, 'FORMSOF(INFLECTIONAL, recipe) or FORMSOF(INFLECTIONAL, cuisine)')

This query should return four rows, adding a book with cuisine in the list of query results.

The above examples barely scratch the surface—one can get very, very fancy with CONTAINS search clauses in order to get specific and tailored sets of search results. The developer can specify weight of each search term to get more specific results. Or ask for words NEAR one another. With a proper parser you can craft some very specific results from a rather open-ended search box. There is, however, one crucial element CONTAINS and FREETEXT lack for crafting these results: some sort of outward ranking of the value of the search results. Enter the two Full-Text TABLE functions—CONTAINSTABLE and FREETEXTTABLE. For more information on using those T-SQL functions stay tuned for Part 3 of this series.

What is the difference between CONTAINS and FREETEXT in SQL Server Full-Text Search?

CONTAINS and FREETEXT are both predicates used in SQL Server Full-Text Search. CONTAINS allows for precision as it can search for a specific word or phrase, and even a prefix of a word. It also supports the use of logical operators and wildcard characters. On the other hand, FREETEXT is more flexible and intuitive. It can search for values that match the meaning of a phrase, not just the exact phrase. However, it doesn’t support logical operators or wildcard characters.

How can I use wildcard characters in CONTAINS?

CONTAINS supports the use of wildcard characters, but only at the end of a prefix. For instance, you can search for ‘comput*’ to find ‘computer’, ‘computing’, ‘computed’, etc. However, you cannot use a wildcard at the beginning or in the middle of a word.

Can I use logical operators with FREETEXT?

No, FREETEXT does not support the use of logical operators. If you need to use logical operators in your full-text search, consider using CONTAINS instead.

How does SQL Server Full-Text Search handle inflectional forms of a word?

SQL Server Full-Text Search can handle inflectional forms of a word. For instance, if you search for ‘run’, it can also find ‘runs’, ‘running’, ‘ran’, etc. This is particularly useful when you want to search for all variations of a word.

What is the performance impact of using SQL Server Full-Text Search?

SQL Server Full-Text Search can have a significant impact on performance, especially if you’re searching large amounts of data. However, you can mitigate this by carefully designing your full-text index and regularly maintaining it.

How can I improve the performance of my SQL Server Full-Text Search?

There are several ways to improve the performance of your SQL Server Full-Text Search. These include optimizing your full-text index, limiting the number of rows returned by your queries, and using a CONTAINSTABLE or FREETEXTTABLE function to rank the results of your search.

Can I use SQL Server Full-Text Search with other languages?

Yes, SQL Server Full-Text Search supports multiple languages. You can specify the language when you create your full-text index.

How does SQL Server Full-Text Search handle noise words or stop words?

SQL Server Full-Text Search ignores noise words or stop words, such as ‘the’, ‘is’, ‘and’, etc. These words are filtered out during the indexing process to improve performance.

Can I customize the list of noise words or stop words in SQL Server Full-Text Search?

Yes, you can customize the list of noise words or stop words in SQL Server Full-Text Search. You can add or remove words from the stoplist as per your requirements.

How can I troubleshoot issues with SQL Server Full-Text Search?

SQL Server provides several tools and techniques to troubleshoot issues with Full-Text Search. These include using the Full-Text Indexing Wizard, checking the error logs, and using the sys.dm_fts_index_keywords system function to view the keywords in your full-text index.