SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Enthusiast PlayStone's Avatar
    Join Date
    Jan 2004
    Location
    Canada
    Posts
    68
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Selecting filtered results from a table with 1 million rows

    Hello MySQL Gurus!

    I am trying to write an interesting query here and I am not sure if it all can be done in one step in the same query. Here is my dilemma:

    I have a DB with two tables: mm_recipients and postal_codes. The first one has a list of users in my system and the second one a list of 1 million postal codes in Canada.

    My objective is to select a list of 50 users that live less than 10Km away from a central location and are between 25 and 50 years old.

    The MM_RECIPIENTS table has the following fields:
    recipient_id
    recipient_name
    recipient_postal_code
    ...
    recipient_bday_year

    The table POSTAL_CODES has the following fields:
    postal_code
    latitude
    longitude

    Now, there is a calculation to come up with the distance, but even ignoring this fact for now, how would I get all the information from the two tables?
    This is what I currently have (which by the way pretty much crashes the server...):

    Code:
    SELECT * FROM mm_recipients, mm_recipients_lists, postal_codes
    WHERE mm_recipients.recipient_id=mm_recipients_lists.recipient_id 
    AND mm_recipients_lists.list_id=7 
    AND (recipient_bday_year < 1983 AND recipient_bday_year > 1958)
    AND mm_recipients.postal_code = postal_codes.postal_code
    ORDER BY rand()
    LIMIT 0,50
    Any ideas to get this going?
    Thanks a lot.

  2. #2
    SitePoint Member Mez's Avatar
    Join Date
    Mar 2008
    Posts
    6
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Can you provide the result of each of the following:

    show index from postal_codes;
    show index from mm_recipients;

  3. #3
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,578
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Code:
    SELECT mm_recipients.id 
    FROM mm_recipients
    INNER JOIN mm_recipients_lists
      ON mm_recipients_lists.recipient_id = mm_recipients.recipient_id 
        AND mm_recipients_lists.list_id = 7
    INNER JOIN postal_codes
      ON mm_recipients.postal_code = postal_codes.postal_code
        AND [calculation for postal code being less than 10km away]
    WHERE mm_recipients.recipient_bday_year < 1983
      AND mm_recipients.recipient_bday_year > 1958
    LIMIT 50
    You'll want indexes on the columns you're joining on for performance.

  4. #4
    SitePoint Enthusiast PlayStone's Avatar
    Join Date
    Jan 2004
    Location
    Canada
    Posts
    68
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks Dan!

    After I posted this question I was playing around with the query and got it to this point:

    Code:
    SELECT * FROM mm_recipients, mm_recipients_lists, postal_codes WHERE 
    			mm_recipients.recipient_id=mm_recipients_lists.recipient_id 
    			AND mm_recipients_lists.list_id=7 
    			AND mm_recipients.recipient_status=1
    			AND (recipient_bday_year < 1983 AND recipient_bday_year > 1958)
    			HAVING(postal_codes.postal_code = mm_recipients.recipient_postal_code)
    			LIMIT 0,50
    Which produced the expected results but the query took over 2 minutes to execute.

    Yours is much faster, even without the indexes.

    Now I am trying to finalize the query by adding the calculation to get me the distance.
    The way that the distance to a point is calculated is by using the latitude and longitude of the found postal code and running the Pythagorean theorem, which is a pain in this case.

    Has anyone tried to tackle this problem before??

    In PHP this is a pretty easy calculation, but doing it all on-the-fly within the QUERY is the super challenge. Any ideas gentleman? I am still working on this one:

    PHP Code:
    function DistanceCalc($lat1$lon1$lat2$lon2$unit) { 

      
    $theta $lon1 $lon2
      
    $dist sin(deg2rad($lat1)) * sin(deg2rad($lat2)) +  cos(deg2rad($lat1)) * cos(deg2rad($lat2)) * cos(deg2rad($theta)); 
      
    $dist acos($dist); 
      
    $dist rad2deg($dist); 
      
    $miles $dist 60 1.1515;
      
    $unit strtoupper($unit);

      if (
    $unit == "K") {
        return (
    $miles 1.609344); 
      } else if (
    $unit == "N") {
          return (
    $miles 0.8684);
        } else {
            return 
    $miles;
          }

    Thanks,
    Bruno.

  5. #5
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by PlayStone View Post
    The way that the distance to a point is calculated is by using the latitude and longitude of the found postal code and running the Pythagorean theorem, which is a pain in this case.
    this is actually the wrong way to calculate the distance between lat/long pairs since the distance between latitudes becomes shorter as you near the poles.

    calculating this type of distance has been brought up a couple of times here:
    http://www.sitepoint.com/forums/show...itude+distance
    http://www.sitepoint.com/forums/show...itude+distance
    http://www.sitepoint.com/forums/show...itude+distance

  6. #6
    SitePoint Enthusiast PlayStone's Avatar
    Join Date
    Jan 2004
    Location
    Canada
    Posts
    68
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks Longneck!

    You are correct. But for the shots distances that I am using, < 100 Km, this is fine.
    I actually ended up using PHP to do my distance calculation, instead of doing it all directly in the query.

    I just retrieve all the results from the DB, including the lat/long and loop through the results in PHP to check the distance. Looping through 1000 results is better than looping through 1M.

    Thanks.

  7. #7
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    i could suggest how to integrate this within your query, but you are using the dreaded evil select * and i don't know your field names.

  8. #8
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    in case you didn't notice, that was a hint that you should post your final query and i'll rewrite if for you.
    Check out our new Industry News forum!
    Keep up-to-date with the latest SP news in the Community Crier

    I edit the SitePoint Podcast


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •