SitePoint Sponsor

User Tag List

Results 1 to 10 of 10
  1. #1
    SitePoint Enthusiast stoproductions's Avatar
    Join Date
    Feb 2007
    Location
    USA
    Posts
    64
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Zip code radius search on a massive scale without killing the app?

    Hello fellow sitepointers, I知 developing an application with a hopeful turn out of at least 4 million users (hey why not aim high). A major part of this application will be locating items by zip code radius. Now i know the calculations to perform between two zip codes to get a distance, and i know where to get a zipcode database with latitude and longitude, but i知 not sure how to do a fast and efficient zipcode radius search that searches every zip code in the continental United States that will be repeated thousands of times a day, without killing the app or the mysql database.

    Basically i need to program a tool that returns the nearest zipcodes from an inputed zip code. I知 not sure if I should do this via a mysql stored procedure http://www.goondocks.com/blog/08-01-...ing_mysql.aspx or do it via the php side. My only concern is slowing down the application. I know there is a way to do this efficiently as many sites already do it. Any help would be greatly appreciated.

    Thank you,
    Nick

  2. #2
    SitePoint Addict
    Join Date
    Feb 2007
    Posts
    270
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    One way is to store lat and long with zipcode, then when doing a distance calc, reduce the number of records to pull by calculating the East/West/North/South boundaries and pulling only zips which fall within that square from the db for your distance calc.

  3. #3
    SitePoint Enthusiast stoproductions's Avatar
    Join Date
    Feb 2007
    Location
    USA
    Posts
    64
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Arlen, Sounds good in theory. Can you get a little more technical for me?

  4. #4
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you restrict the user to search within some predefined distances, like so many sites do(they give a dropdown listing the choice of the radius, eg 5,10,25,50,100 miles) you can just precalculate all the data. No computations needed at runtime. Lookups would be very fast. You're just looking for which zip codes are near, and then once you have that small list of zips, you join it to your items table.

    Something in your favor here is that the zip code distance data doesn't change all that often(once a month or so). So you can really optimize this lookup.

  5. #5
    SitePoint Wizard
    Join Date
    Nov 2005
    Posts
    1,191
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I would imagine a square would be fine for zip codes (as opposed to a circle, which would be more accurate and more difficult). I mean depending on where you are in the zip area would put you some distance off from the centre of another anyway.

    psuedo:
    select zips where lat between mylat-distance and mylat+distance and long between ...

  6. #6
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you need arbitrary distances, you could find the points in the square, and then if you need to make it more accurate, run the circle calculation on those points you found in the square.

  7. #7
    SitePoint Guru
    Join Date
    Jun 2006
    Posts
    638
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    None of these suggestions will work well with a big data set...
    They might work for a 1mill record table, but if you reach 10-20mill, they will be way to slow... (had to do this a few times with 25mill+ records)

    Arlen's solution will work with a smaller number of records, because of how indexes are used...

    crmalibu's suggestion will work, if the number of zip codes you have is small, and doesn't change. And if you have 1,000,000 zip codes, you will be searching a table with 1,000,000 * 1,000,000 * # of ranges you can have. (will work on an index, but still...)

    hash's might work, but you need an index on [latitude, longitude], but I'm not 100% sure it will use the index correctly here. (BETWEEN did not always work correctly for us, mix between old and new mysql versions in production...)

    Here are a few suggestions:
    - Your searching for people 100 miles from $x / $y

    #1 the simplest one (Arlen's suggestion):
    - store the latitude/longitude
    - SELECT ... WHERE x < $x + 100 AND x > $x - 100 AND y < $y + 100 AND y > $y - 100;
    That will return a square, you could add Pythagoras in there (after the query), to get a circle.
    The problem:
    - You can't use indexes there, MySQL will only use X or Y as an index, so you do end up cutting the records you search on, but will still be 1 by 1 for the rest.
    - You can use BETWEEN instead of > <

    #2 not so simple one
    - store the latitude/longitude
    - figure out what "ranges" you will search on
    - split up the users in "sectors", each sector of 100 miles lets say, arranged like a chess board.
    Ex:
    1 2 3
    4 5 6
    7 8 9
    - Set an index on this new "sector" field.
    - When you do your search, you calculate the sector of the search origin, lets say we end up with 4 from the previous graph.)
    - Do the query: SELECT .. WHERE sector IN (1,2,4,5,7,8) AND ...
    (this is assuming your map does not wrap, but you get the idea)
    - You then add your #1 in there, and make sure it will use the "sectors" index.
    The idea here is that your search will limit the users to a square (allot less records than in a long stripe it normally limits users by)

    The last suggestion is the one that works fastest, but tales a while to code/get it right.

  8. #8
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have never used it, but MySql has a spatial extension, that should be able to crunch those kind of calculations.

  9. #9
    I solve practical problems. bronze trophy
    Michael Morris's Avatar
    Join Date
    Jan 2008
    Location
    Knoxville TN
    Posts
    2,026
    Mentioned
    64 Post(s)
    Tagged
    0 Thread(s)
    Vali - there are only 100,000 possible zip code combinations in the first place so how can you have millions of zip codes? Further the actual number of zip codes in use is closer to around 50,000.

    That said you are right in that trying to precompile all possible ranges between them would lead to an impractically large data set.

  10. #10
    SitePoint Enthusiast stoproductions's Avatar
    Join Date
    Feb 2007
    Location
    USA
    Posts
    64
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Everyone thank you for replying to this thread, especially Vali. This is why I use sitepoint forums. The caliber of the answers and the brilliance of it's members.

    @Vali - I will probably go with your first answer (Arlens explination) because that will be the simplest for me to implement and if I hit over 1 million items... well then i will cross that bridge when i get to it.

    Thanks again everyone!

    -Nick


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •