SitePoint Sponsor

User Tag List

Results 1 to 12 of 12
  1. #1
    SitePoint Wizard mPeror's Avatar
    Join Date
    Mar 2005
    Location
    Saudi Arabia
    Posts
    1,725
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How to implement the 'did-you-mean' feature?

    I'm basically trying to implement a did-you-mean feature like google. See for example when you search for 'sitepppoint', the results will show 'Did you mean: sitepoint' regardless if it had any results or not.

    So, how can this be done? (i'm not looking for code, i just need the concept explained in either a small example or clear text).


    Your help is much appreciated

  2. #2
    SitePoint Guru babyboy808's Avatar
    Join Date
    Nov 2004
    Location
    dublin
    Posts
    602
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well not sure about this one, but i once made a small version of this, I had a few keywords relating to food, pasta, pizza, burger etc. and each food item eg => burger had an array relating to it (buurger, buger, bugger etc). and if a user entered in a value of the array, it would output burger...

  3. #3
    SitePoint Wizard mPeror's Avatar
    Join Date
    Mar 2005
    Location
    Saudi Arabia
    Posts
    1,725
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well that'd be a good idea for a very small database or few keywords, but the database i'm working on is always expanding, and i can't make up typos of each entry. I'm trying to make some sort of "smart" script, but i don't have any idea how this is approached.

  4. #4
    SitePoint Member
    Join Date
    Apr 2006
    Posts
    23
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Try to take a look here: http://php.net/pspell

  5. #5
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by mPeror

    So, how can this be done? (i'm not looking for code, i just need the concept explained in either a small example or clear text).

    Start here
    http://en.wikipedia.org/wiki/Fuzzy_string_searching

  6. #6
    Obey the Purebreed trib4lmaniac's Avatar
    Join Date
    Dec 2004
    Location
    Cornwall, UK
    Posts
    594
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    soundex
    levenshtein
    metaphone

    A combination of the three is almost scary

  7. #7
    SitePoint Guru momos's Avatar
    Join Date
    Apr 2004
    Location
    Belgium
    Posts
    920
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Most of the time levenshtein is sufficient for this kind of search.

  8. #8
    Obey the Purebreed trib4lmaniac's Avatar
    Join Date
    Dec 2004
    Location
    Cornwall, UK
    Posts
    594
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by momos
    Most of the time levenshtein is sufficient for this kind of search.
    It's a bit dodgy with string lengths though. It sometimes says two words are similar even when one of them is, say 5x bigger. Using a combination of the above functions can fix this.

  9. #9
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by momos
    Most of the time levenshtein is sufficient for this kind of search.
    Can you tell us how would you use levenshtein() for searching in a database of, say, 100,000 words?

  10. #10
    Obey the Purebreed trib4lmaniac's Avatar
    Join Date
    Dec 2004
    Location
    Cornwall, UK
    Posts
    594
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thankfully MySQL has the soundex function built in, which should reduce the result set dramatically.
    Code:
    SELECT * FROM `words` WHERE SOUNDEX(`word`) = SOUNDEX('$search')
    Store these into an array and then (why is there no recursive array sort function?):
    PHP Code:
    for($i 0$i < (count($words) - 1); $i++)
        if(
    levenshtein($words[$i], $search) > levenshtein($words[$i 1], $search)) {
            
    $temp $words[$i];
            
    $words[$i] = $words[$i 1];
            
    $words[$i 1] = $temp;
        } 
    I'm only guessing here though as I've never attempted fuzzy searching before

  11. #11
    Wadge! F4nat1c's Avatar
    Join Date
    Oct 2005
    Location
    South Wales, UK
    Posts
    1,134
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The PSpell suggestion by Eugene is the best. The others require a manually produced database of 'keywords' where as with pspell you already have a spelling library.
    OMFG SitePoint ROXORZ TEH BIG ONE111!
    Wish you were invisible?

  12. #12
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by trib4lmaniac
    Thankfully MySQL has the soundex function built in, which should reduce the result set dramatically.
    Code:
    SELECT * FROM `words` WHERE SOUNDEX(`word`) = SOUNDEX('$search')
    This makes sense, however soundex is far from accurate, especially if first letters don't match (try sitepoint/citepoint). metaphone() is better, but in that case you have to index dict tables in php (or switch to postgres ).


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •