SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    PHP Otaku Gibb's Avatar
    Join Date
    Jul 2004
    Location
    Texas
    Posts
    454
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    How should i index to improve database searching?

    I've run into a bit of a snag. I've been working on a website that deals with business addresses such as restaurants and hotels. For the last month or so i've been using a sample piece of a much larger data collection, with only a couple thousand rows or so. Yesterday i inserted almost 5 million rows into an address table, and my site was almost crippled by the impact. My street search function takes around 15 seconds to run, and often doesnt return results when it should. In this address table i have two main columns that i search, address_street, and address_city. For example i can search for all Restaurants on Main St in Dallas, TX. How could i add indexes to this table to make these searches run more efficiently? I added an index to the street column only, and i didnt notice any speed difference.

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,247
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    it depends on how you stored the addresses

    if the address field includes the street number, e.g. 123 sesame street, then your search query will have to be like this:

    ... where address like '%sesame%'

    which will ignore the index

    however, if you split the street number into its own column, then your search query will be

    ... where address like 'sesame%'

    which will utilize the index
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    PHP Otaku Gibb's Avatar
    Join Date
    Jul 2004
    Location
    Texas
    Posts
    454
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the reply. The schema for the table is this:

    address_id, address_number, address_street, address_city, address_state

    Would the index be thrown off by Prefixes such as these streets:
    "W Main St"
    "N Elm Ln"

    Our customers more than likely leave the prefix off when typing in an address, so we would usually be looking for just "Main" or just "Elm". Course the number of streets with prefixes is smaller than the number without, so i guess indexing will still help a bit.

    Another question would be, does it do anything to index the address_street with the address_city, and address_state in one index? All 3 fields are inputted at once from the customer.

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,247
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Gibb
    Would the index be thrown off by Prefixes such as these streets:
    "W Main St"
    "N Elm Ln"
    no, as long as you search with LIKE 'W Main%' and not LIKE '%Main%' the index should be used

    Quote Originally Posted by Gibb
    Another question would be, does it do anything to index the address_street with the address_city, and address_state in one index? All 3 fields are inputted at once from the customer.
    you would have to run timing tests on your actual sample of data to see the difference

    don't forget, you can declare a compound index on 3 columns 6 different ways, and performance depends on which columns comes first in the index
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    PHP Otaku Gibb's Avatar
    Join Date
    Jul 2004
    Location
    Texas
    Posts
    454
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Wow that did the trick

    I threw together a slightly-more-than-basic search function that did the following:

    User inputs: "W Forest Grove Rd"
    DB checked for full input, nothing found
    DB checked for "W Forest Grove" and "Forest Grove Rd", still nothing found, keep going
    DB checked for "W Forest", "Forest Grove", and "Grove Rd", then eventually stops with
    DB checked for "Forest" and "Grove" (Prefixes and suffixes left out of last break down).

    For each of these steps i was looking for '%$piece%'. When i changed that to '$piece%' the search ran incredibly fast. I suppose it wont work quite as well if somebody is searching for "Forest Grove" and it's in the DB as "W Forest Grove", but i suppose i can create another search function that trys out each of the 4 prefixes if i need to.

    Thanks for the help

  6. #6
    SitePoint Zealot csi95's Avatar
    Join Date
    Jan 2005
    Location
    Albany, NY
    Posts
    151
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Gibb
    I suppose it wont work quite as well if somebody is searching for "Forest Grove" and it's in the DB as "W Forest Grove", but i suppose i can create another search function that trys out each of the 4 prefixes if i need to.
    You can probably help yourself a lot by standardizing the addresses before they are stored.

    I'm not sure where the addresses come from, but if they come from user input, different people will write the same address different way. You could end up with:

    W Forest Grove
    W. Forest Grove
    West Forest Grove

    This gets even more complex with things like "Street", "Lane" or "Avenue" where you might get "Street" or "St" or "St.".

    My doing some pre-processing on the strings before you store them in the DB, you can standardize them a bit. Remove periods. Change "West" to "W", eliminate extra spaces, etc.

    The end result will be much better search results.

    Good luck!
    Join the EasyImage Affiliate Program!
    30% commission on all sales
    Conversion rates as high as 20%
    Dedicated Affiliate Manager to help you succeed!


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •