SitePoint Sponsor

User Tag List

Results 1 to 10 of 10
  1. #1
    SitePoint Enthusiast
    Join Date
    Nov 2001
    Posts
    82
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Seach Spelling Variations

    Hi, I have a search form on my site that searches 100,000+ records in a database and what I want to is make it so that the search can do "Spelling Variations." Right now my search only comes up with exact matches with whatever on either side of the key (ie my MySQL query look like %key%). I hope you get what I'm talking about and if you do please tell me how I can do this?
    Thanks, Chris
    -------------------------
    http://spotlyrics.com
    -------------------------

  2. #2
    Sultan of Ping jofa's Avatar
    Join Date
    Mar 2002
    Location
    SvÝ■jˇ­
    Posts
    4,080
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    regexp ?

  3. #3
    SitePoint Wizard samsm's Avatar
    Join Date
    Nov 2001
    Location
    Atlanta, GA, USA
    Posts
    5,011
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I've been thinking about adding the ability to find results despite a mispelling. A project I am working on has a great deal of data entry, so if someone entering data makes an error, an abitious search could be the difference between the product working, or not working.

    Anyway, I had thought of regex as a way to go but it seemed really awkward.For example:
    search term: "search"
    regex: "s.?e.?a.?r.?c.?h.?"
    That would cover one extra letter inserted somewhere (seaarch), but what if one of the letters was missing? (serch)
    This: s?.?e?.?a?.?r?.?c?.?h?.?
    Would match anything!

    Perhaps there is technique to getting regex to match similar words, but I'm not coming up with anything at the moment that isn't really awkward or require multiple regexes. If there is a technique to getting those kinds of matches, i'd be very interested in learning about that.

    It would be nice if it were possible to use functions like metaphone, similar_text or levenshtein in a query.
    www.php.net/metaphone
    www.php.net/similar_text
    www.php.net/levenshtein
    Using your unpaid time to add free content to SitePoint Pty Ltd's portfolio?

  4. #4
    imagine no limitations exbabylon's Avatar
    Join Date
    Dec 2000
    Location
    Idaho, USA
    Posts
    452
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I just got finished coding a spell check for a client, very simple implamentation... check out a little program called Aspell, runs on Unix and comes with most installs of a *nix box. You can actually interface it directly with PHP, as a function, but it actually works quite well as a stand along. Couple just a basic english spell check with some simple regex's and you've have a good search feature.
    Blamestorming: Sitting around in a group discussing why a deadline was missed or a project failed and who was responsible.

    Exbabylon- Professional Internet Services

  5. #5
    SitePoint Wizard samsm's Avatar
    Join Date
    Nov 2001
    Location
    Atlanta, GA, USA
    Posts
    5,011
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think I see how that would work. You first spell check the person's terms, then you search for their spelling and the spell check application's suggestions. Or (like google) just suggest they use the alternate spellings as searches.

    I imagine you could also delete aspell's dictionary and replace it with all the terms in your database. Then, the spell check will only return useful terms. Thanks, exbabylon that was some helpful advice.
    Using your unpaid time to add free content to SitePoint Pty Ltd's portfolio?

  6. #6
    SitePoint Wizard gold trophysilver trophy
    Join Date
    Nov 2000
    Location
    Switzerland
    Posts
    2,479
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHP is already there: http://www.php.net/manual/en/ref.pspell.php

    Just need to compile that extension.

  7. #7
    SitePoint Wizard samsm's Avatar
    Join Date
    Nov 2001
    Location
    Atlanta, GA, USA
    Posts
    5,011
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Cool! I imagine that would be the function exbabylon mentioned... was I pretty much correct as how it would work? It would provide suggestions via an array returned by pspell_suggest and you could just include those in the query.
    Using your unpaid time to add free content to SitePoint Pty Ltd's portfolio?

  8. #8
    Your Lord and Master, Foamy gold trophy Hierophant's Avatar
    Join Date
    Aug 1999
    Location
    Lancaster, Ca. USA
    Posts
    12,305
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Store your values in both proper spelling and metaphone versions and use that to search with. You can look up Metaphone in the PHP user's guide.
    Wayne Luke
    ------------


  9. #9
    SitePoint Wizard samsm's Avatar
    Join Date
    Nov 2001
    Location
    Atlanta, GA, USA
    Posts
    5,011
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks, I was only familiar with metaphone in the context of comparing two keys... I did not think about it enough to realize it produced a key for each pronouciation. Is metaphone fairly standardized from language to language? Could I expect database keys crerated with a Perl Metaphone module to match php's metaphone keys?

    Perl Text::Metaphone (http://search.cpan.org/author/MSCHWE...6/Metaphone.pm)
    Perl Text::DoubleMetaphone (http://search.cpan.org/author/MAURIC...leMetaphone.pm)
    PHP's metaphone (www.php.net/metaphone)
    Using your unpaid time to add free content to SitePoint Pty Ltd's portfolio?

  10. #10
    imagine no limitations exbabylon's Avatar
    Join Date
    Dec 2000
    Location
    Idaho, USA
    Posts
    452
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Google has some great API's, so you talking about offering a Google type spell option is an absolute, litteral reality. If you've never looking into the Google API's, check out this: http://www.google.com/apis/ you can actually get the same word that Google would recommend. Fairly simple implamentation through PHP.

    And yes, the pspell function is the function I was referring to. Just make sure to use Aspell, it's much superior to Ispell when you set it all up. Or just connect to Aspell directly.
    Blamestorming: Sitting around in a group discussing why a deadline was missed or a project failed and who was responsible.

    Exbabylon- Professional Internet Services


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •