Find and Correct Misspelled Words with Pspell

Every one of us has made a spelling mistake in a Google search: "alternitive music", for example. In doing so, you may have noticed that Google was trying to help you by displaying: "Did you mean alternative music?". If your site has a search function, to indicate misspellings if no or too few results have been found is a very useful feature, especially if the bad English of a visitor can make you miss a sale. Fortunately, PHP's Pspell module allows for checking the spelling of a word and suggesting a replacement from its default dictionary (you can also create a custom dictionary).

To begin, we need to check if Pspell is installed:

<?php
$config_dic= pspell_config_create ('en');

If you get an error, it isn't. On Linux systems, follow these instructions to solve the problem.

Use the default dictionary

Here is a small function to help you understand how Pspell works:

<?php
function orthograph($string)
{
    // Suggests possible words in case of misspelling
    $config_dic = pspell_config_create('en');

    // Ignore words under 3 characters
    pspell_config_ignore($config_dic, 3);

    // Configure the dictionary
    pspell_config_mode($config_dic, PSPELL_FAST);
    $dictionary = pspell_new_config($config_dic);

    // To find out if a replacement has been suggested
    $replacement_suggest = false;

    $string = explode('', trim(str_replace(',', ' ', $string)));
    foreach ($string as $key => $value) {
        if(!pspell_check($dictionary, $value)) {
            $suggestion = pspell_suggest($dictionary, $value);

            // Suggestions are case sensitive. Grab the first one.
            if(strtolower($suggestion [0]) != strtolower($value)) {
                $string [$key] = $suggestion [0];
                $replacement_suggest = true;
            }
        }
    }

    if ($replacement_suggest) {
        // We have a suggestion, so we return to the data.
        return implode('', $string);
    } else {
        return null;
    }
}

To use this function, it is sufficient to pass to it a string parameter:

<?php
$search = $_POST['input'];
$suggestion_spell = orthograph($search);
if ($suggestion_spell) {
    echo "Try with this spelling : $suggestion_spell";
}

If the string you submit to Pspell is "here is my mispellid word", the previous script will return: "Try with this spelling: Here is my misspelled word." However, Pspell is no miracle worker, especially if you're automatically using the first suggested spelling alternative! For best results, you can use all the suggestions offered by Pspell. The following script returns twenty proposals around the word "lappin":

<?php
$dict = pspell_new ("en");
if (!pspell_check ($dict, "lappin")) {
    $suggestions = pspell_suggest ($dict, "lappin");
     foreach ($suggestions as $suggestion) {
        echo "Did you mean: $suggestion?<br />";
     }
}

You must configure a dictionary to initialize Pspell. To do this, create a descriptor toward a configuration file of the dictionary, change some options of this descriptor, then use the configuration dictionary to create a second descriptor for the real dictionary. If this sounds a bit complicated, do not worry: The code rarely changes and you can usually copy it from another script. However, here we will study it step by step. Here is the code
that configures the dictionary:

    // Suggests possible words in case of misspelling
    $config_dic = pspell_config_create('en');

    // Ignore words under 3 characters
    pspell_config_ignore($config_dic, 3);

    // Configure the dictionary
    pspell_config_mode($config_dic, PSPELL_FAST);

$config_dic is the initial template which controls the options for your dictionary. You must load all the options in $config_dic, then use it to create the dictionary. pspell_config_create() creates an English dictionary (en). To use the English language and specify that you prefer American spelling, specify ‘en’ as the first parameter and 'american' as the second. pspell_config_ignore() indicates that your dictionary will ignore all words of 3 letters or less. Finally, pspell_config_mode() indicates to Pspell the operating mode:

• PSPELL_FAST is a quick method that will return the minimum of suggestions.
• PSPELL_NORMAL returns an average number of suggestions at normal speed.
• PSPELL_SLOW provides all possible suggestions, although this method takes some time to perform the spell check. We could still use other configuration options (to add, for example, a custom dictionary, as we shall see later), but as this is a quick check, we will simply create the dictionary with this line:

    $dictionary = pspell_new_config($config_dic);

From this point you can use the dictionary in two ways:
1. pspell_check($dictionary, "word") returns true if "word" is in the dictionary.
2. pspell_suggest($dictionary, "word") returns an array of suggested words if "word" is not in the dictionary (the first element of this array is the most likely candidate). The number of words obtained varies, but you get more with PSPELL_SLOW and fewer with PSPELL_FAST.

Now that the dictionary is ready, we cut the string that was passed as a parameter to obtain an array of words: ‘here my sentence‘ becomes an array of three elements, "here", "my", and "sentence". Then we check the spelling of each word using the default dictionary. Because it does not like commas, we also delete them before exploding the string. If the word has more than three characters, verification takes place and in case of misspelling, we conduct the following operations:

  1. We ask Pspell to provide an array of suggestions for correction.
  2. We take the most likely suggestion (the first element of the array $suggestion) and we replace the misspelled word with it.
  3. We set the $replacement_suggest flag to true so that at the end of the processing loop, we know that we have found a spelling mistake somewhere in $string. At the end of the loop, if there were spelling corrections, we are reforming the string from elements of the corrected array and we return this chain. Otherwise, the function returns null to indicate that it has not detected misspelling.

Add a custom dictionary to Pspell

If a word is not in the default dictionary, you can easily add it. However, you can also create a custom dictionary to be used with the default.
Create a directory on your site where PHP has the right to write and initialize the new dictionary in it. To create a new dictionary file called perso.pws in the directory path of your server, use the following script:

<?php
$config_dic = pspell_config_create ('en');
pspell_config_personal($config_dic, 'path / perso.pws');
pspell_config_ignore($config_dic , 2);
pspell_config_mode($config_dic, PSPELL_FAST);
$dic = pspell_new_config($config_dic);

This is the same script as in the previous section, but with an essential addition: calling pspell_config_personal() initializes a personal dictionary file. If this file does not already exist, Pspell creates a new one for you. You can add to this dictionary as many words as you want by using the following function:

`pspell_add_to_personal($dic, "word");`

As long as you have not saved the dictionary, words are added to it temporarily. Therefore, after inserting the words you want, add this line to the end of the script:

pspell_save_wordlist($dic);

Then call pspell_config_personal() as above in the demo script and your new dictionary will be ready.

Conclusion

Pspell can help you with your conversion rate by providing your visitors with a way to automatically correct and notice their typos. It can enhance search experiences, forum submissions, and general linguistic accuracy of a web site with user submitted content. If you'd like to take a deeper look at Pspell, or have implemented it in an interesting manner, let us know in the comments below!

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Anonymous

    Isn’t it time that these sorts of tutorials used object oriented designs instead of just a clunky standalone function?

    • Anonymous

      If it were any more complex, I would agree, but this article being nothing more than an introduction into the pspell “API”, I think the bare bones nature of the code is perfectly fine. There will be more advanced articles with pspell coming later.

      • Anonymous

        Fair enough but my concern is that so many novice PHP programmers will see tutorials like this and copy-and-paste unedited into their projects. In my opinion it’s never too early to start introducing optimal programming design paradigms.

  • adrien

    O thank you very much M Bruno for the answer at the comments .
    I had just seen the published article .

    thanks