SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Member
    Join Date
    Sep 2009
    Posts
    2
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Handling alternate spellings for search queries

    Hey everyone,

    I've been putting together a database for a website I'm creating. One part of the database is providing information about certain locations in Japan--what they're called, where they are, what the Japanese letters look like, and so forth.

    My problem is that when people "romanize" Japanese names, there are several different ways that they can be spelled with English letters, and while there are sometimes standard spellings that people adopt (like "Tokyo"), less-known places are often spelled in a variety of different ways. Sometime people spell the long vowels different ways, some people condense them, some people use the vowel with a line over it to indicate length.

    Example:

    兵庫県 could be romanized as
    Hyogo Prefecture
    Hyōgo Prefecture
    Hyougo Prefecture

    I'm definitely picking one spelling and sticking to it as far as database information goes, to keep things consistent, but what about the people who come to the site and search using an alternate spelling? What's the best approach to handling this?

    I've considered adding a column to the database for handling alternate spellings, ie if someone searches for "Hyougo" and the search turns up empty in the "name" column, then a script could run a search on the alternate spellings column to see if that phrase turns up. Or would it be better to perform the search query on both columns simultaneously? Are there technical repercussions to an alternate spelling column that I'm missing?

    Thanks in advance!

  2. #2
    SitePoint Addict
    Join Date
    Jan 2007
    Posts
    344
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    soundex or its close cousins might work well for this.

  3. #3
    SitePoint Member
    Join Date
    Sep 2009
    Posts
    2
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by plumsauce View Post
    soundex or its close cousins might work well for this.
    Tell me more. It sounds interesting, but I don't know about implementing it, especially since the names are based phonetically in Japanese, not English, and use different sounds and lack certain English sounds. Would that matter at all?

  4. #4
    SitePoint Author silver trophybronze trophy
    wwb_99's Avatar
    Join Date
    May 2003
    Location
    Washington, DC
    Posts
    10,653
    Mentioned
    4 Post(s)
    Tagged
    0 Thread(s)
    soundex is designed to help de-duplicate english sounding names. That said, the concept isn't horribly tricky, and there might be a japanese focused equivalent.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •