SitePoint Sponsor

User Tag List

Results 1 to 11 of 11
  1. #1
    Web-coding NINJA! silver trophy beetle's Avatar
    Join Date
    Jul 2002
    Location
    Dallas, TX
    Posts
    2,900
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Large Zip Code ASCII file....

    I found a company that sells all the US Zip Code information I need for my application. They deliever it as as an ASCII text file (.dat) The file is ~73,500 records, with one record per line. Each record is 78 characters, composed like this
    Code:
    Field Name   Length Type
    -----------------------------
    ZIP Code     5      Character
    State        2      Character
    City         28     Character
    Type         1      Character
    County FIPS  5      Character
    Latitude     7      Numeric (Dec. = 4)
    Longitude    8      Numeric (Dec. = 4)
    Area Code    3      Character
    Finance Code 6      Character
    Last Line    1      Character
    FAC          1      Character
    MSA          4      Character
    PMSA         4      Character
    Filler       3      Character
    CR + LR      2      Character
    Now, the only data I really NEED is Zip Code, State, City, Latitude, and Longitude.

    So, (what you've been waiting for) my question is: Would it be best to extract the data I need from this file into MySql? This informtion will be used for radius lookups.
    beetle a.k.a. Peter Bailey
    blogs: php | prophp | security | design | zen | software
    refs: dhtml | gecko | prototype | phpdocs | unicode | charsets
    tools: ide | ftp | regex | ffdev




  2. #2
    \m/ R.I.P. Dimebag! \m/ JimBolla's Avatar
    Join Date
    Dec 2001
    Location
    erie, pa
    Posts
    1,130
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If that's all the data you need, then don't buy it. That is available for free. I know because I've downloaded it before. And yes I'd reccommend exporting to some datasource other than flat file.
    -- JIM BOLLA
    Wanna play Halo 2? My XBOX Live gamertag: crowdozer

  3. #3
    Web-coding NINJA! silver trophy beetle's Avatar
    Join Date
    Jul 2002
    Location
    Dallas, TX
    Posts
    2,900
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well, I've been looking and have only found sources that cost, or services you can 'plug in' to (not what we want). Remember where you got it?
    beetle a.k.a. Peter Bailey
    blogs: php | prophp | security | design | zen | software
    refs: dhtml | gecko | prototype | phpdocs | unicode | charsets
    tools: ide | ftp | regex | ffdev




  4. #4
    Web-coding NINJA! silver trophy beetle's Avatar
    Join Date
    Jul 2002
    Location
    Dallas, TX
    Posts
    2,900
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Since my last post, I have found another source with this same data that is less expensive and can deliver it as an MDB or XLS, which would save me some legwork re: importing.

    Still haven't found anything for free...
    beetle a.k.a. Peter Bailey
    blogs: php | prophp | security | design | zen | software
    refs: dhtml | gecko | prototype | phpdocs | unicode | charsets
    tools: ide | ftp | regex | ffdev




  5. #5
    The doctor is in... silver trophy MarcusJT's Avatar
    Join Date
    Jan 2002
    Location
    London
    Posts
    3,509
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    MarcusJT
    - former ASP web developer / former SPF "ASP Guru"
    - *very* old blog with some useful ASP code

    - Please think, Google, and search these forums before posting!

  6. #6
    Web-coding NINJA! silver trophy beetle's Avatar
    Join Date
    Jul 2002
    Location
    Dallas, TX
    Posts
    2,900
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Talking Update!

    Thanks to all who posted. Here's my functioning test page using the data from the link provided by M@rco. Thanks dude!

    http://www.lanwizards.com/ziptest.php

    Now, I have another question if you guys can entertain it. I'm pretty fresh to database design. What would be the best way for me to setup keys on this sucker? Here's the columns for the table zip_data

    state char(2)
    zip char(5)
    lat float(9,6)
    lon float(9,6)

    I was going to make the ZIP the primary key, but there are some duplicates (do a 10mi search on 52761) I'm not sure if these dupes are supposed to be there or not. Anyhow, here's some sample queries used to generate the data on my test page.
    PHP Code:
    SELECT latlon FROM zip_data WHERE zip 77901;
    SELECT ziplatlon FROM zip_data WHERE lat BETWEEN 41.268804834095 AND 41.557939165905 AND lon BETWEEN -91.1781288779 AND -90.8341091221
    Ok, one last question. You can see from the 2nd query that when finding zips in the radius, I get the zip, lat, and lon columns all at once. How would the speed differ if I made this a two step process?

    1) Select just the zip column within the ranges
    2) Select the lat and lon columns for each zip found

    The reason I ask this is because I'm making a zipcode class to handle all this stuff, and I'm considering doing it this way for so I can put all the SQL statments in my classs functions...right now, they are not.
    Last edited by beetle; Oct 30, 2002 at 13:00.
    beetle a.k.a. Peter Bailey
    blogs: php | prophp | security | design | zen | software
    refs: dhtml | gecko | prototype | phpdocs | unicode | charsets
    tools: ide | ftp | regex | ffdev




  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    >> http://www.lanwizards.com/ziptest.php

    found 93 within 10 miles of 90210, and some of them way more than 10 miles away, so your app still has bugs

    as for the data, there should not be any dupes unless you have two (conflicting) entries for the same zip, which means you may have merged data from more than one source, but in any case you can quickly find out by running
    Code:
    select zip from yourtable
    group by zip having count(*)>1
    and i would definitely make zip the primary key

    as for multiple steps, you already have it as two steps, right? and you're asking if you should make it three? i think you should make it one
    Code:
    select that.zip, that.lat, that.lon 
      from zip_data this
         , zip_data that
     where this.zip=77901
       and join using "within" calculation
    it looks like your "within" calculation is based on a rectangular box rather than a circle...

    rudy

  8. #8
    Web-coding NINJA! silver trophy beetle's Avatar
    Join Date
    Jul 2002
    Location
    Dallas, TX
    Posts
    2,900
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ya...I don't want to use a circle...too much math for just a lookup. I can filter out those that have a distance greater than what is requested...much easier that way. I didn't see any that are way off in your example...3 miles or so over, which are most likely at the corners of my 'box'.

    I only got the data from one source...the file I got from the link M@rco provided. I guess I will have to go in and remove the dupes.

    About the multiple step thing...here's my algorithm in plain english
    1. Get Lat and Lon values from table based on user entered zip
    2. Create a bounding-box based on those Lat and Lon values
    3. Find all Zips and their lat/long values that fall within the bounding box
    4. Use original Zip data and found Zip data with haversine forumla to calculate disatnce from original to each found zip.
    5. Display results
    As you can see, 5 simple steps. Now, for the purpose of cleanliness, I like to change this up a bit, so it would be...
    1. Get Lat and Lon values from table based on user entered zip
    2. Create a bounding-box based on those Lat and Lon values
    3. Find all Zips within the bounding box
    4. Get Lat and Lon values for each of the zips found above
    5. Use original Zip data and found Zip data with haversine forumla to calculate disatnce from original to each found zip.
    6. Display results
    AS you can see, I have split step three from the first list into steps 3 and 4 here. Why? Because I can re-use the class function from step 1 to do step 4, and that way I can stuff all my SQL statements into the class functions. I just don't know if the extra step is going to cuase a speed decline. I don't know how to benchmark such stuff, otherwise I'd test it myself.

    EDIT: I have changed up the page to use the method I describe in my 2nd list. Can't tell if it's faster or slower, but certainly much cleaner. I added the PHP source code to the page so you can better see what I'm talking about.
    Last edited by beetle; Oct 30, 2002 at 18:17.
    beetle a.k.a. Peter Bailey
    blogs: php | prophp | security | design | zen | software
    refs: dhtml | gecko | prototype | phpdocs | unicode | charsets
    tools: ide | ftp | regex | ffdev




  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    if you could describe the process for "Create a bounding-box based on those Lat and Lon values" then i'll help you with the sql so you can combine steps 1-4 into one step

    i don't know what class you are talking about, but i imagine if it's some kind of programming construct, it would work by passing in the zip and getting back the zips and their lat/lon values... which you can do with one sql call

    in fact if you throw the have-a-slice thing in there, you can get the class to accept a zip and feed back zips and their distances

    letting the db engine do everything in one step is probably way faster than you doing it in code (arrays?) in several steps

    but then, i can't prove that unless you benchmark it


    rudy

  10. #10
    Web-coding NINJA! silver trophy beetle's Avatar
    Join Date
    Jul 2002
    Location
    Dallas, TX
    Posts
    2,900
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hey rudy, thanks for the follow up

    Actually, If you check the EDIT to my last post, I added the PHP source to the test page, http://www.lanwizards.com/ziptest.php Do you know PHP?

    Yes, a class is a programming construct.

    Sorry about the type on the haversine formula...I fixed it in my last post....

    I don't know enough MySql to be real dangerous, but I'd be surprised if this could all be done in MySql. If I'm not mistaken, it does not support stored procedures the way some other DBs do.
    beetle a.k.a. Peter Bailey
    blogs: php | prophp | security | design | zen | software
    refs: dhtml | gecko | prototype | phpdocs | unicode | charsets
    tools: ide | ftp | regex | ffdev




  11. #11
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,350
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    no, i don't know php

    i can see the formula, though, with the cos and asin and sqrt and stuff...

    you will be pleasantly surprised, i guess, but that can all be done in mysql

    stored proc not required

    you could put all the calcs into the WHERE clause of the query and get everything back in one sql call

    less code, more efficiency

    but hey, it's already working, so...


    rudy


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •