SitePoint Sponsor

User Tag List

Results 1 to 25 of 25
  1. #1
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question CREATE FUNCTION vs CALL PROCEDURE

    I have been working with some pagination for a client and I have run into a sorting issue. I want the pagination to work by making sure that the sorting happens with my database query. I do not want to resort to storing every row in an array and then sorting that if I don't have to.

    So I can sort by a few fields but one sort choice in particular I'm having trouble with is by location. I have a separate zipcode database which I use to calculate distances via a php script in other areas of the site. However, for pagination, I would need this calculation to happen as the database is getting results. So I began to research ways of creating custom functions. I came across the two mentioned in the summary. I think that the custom FUNCTION is something I could call in line with a query and make it a field in the result set. The PROCEDURE seems to allow multiple lines/commands but doesn't appear as if I could use it with a query. I mean, I know I could make a query within the procedure, but I don't think I could make something a field.

    Anyway, I wanted to calculate distances on the fly in a query. so, creating a function and calling it like :

    Code:
    SELECT t.field1, t.field2, DISTANCE(givenzip, t.field2) as dist FROM table1 t ORDER BY dist
    I'm pretty sure that MySQL has built-in trig functions to allow me to port my php script (modified of course) into a custom function:

    PHP Code:
    function DistanceCalc($lat1$lon1$lat2$lon2$unit "M"$places 5) { 

        
    $theta $lon1 $lon2
        
    //spherical law of cosines
        
    $dist sin(deg2rad($lat1)) * sin(deg2rad($lat2)) +  cos(deg2rad($lat1)) * cos(deg2rad($lat2)) * cos(deg2rad($theta)); 
        
    $dist acos($dist); 
        
    $dist rad2deg($dist);
         
        
    $miles $dist 60 1.1515;
        
    $unit strtoupper($unit);
        
        
    //can return in kilometers or nautical miles?
        
    if ($unit == "K") {
            return 
    round($miles 1.609344$places); 
        } else if (
    $unit == "N") {
              return 
    round($miles 0.8684$places);
        } else {
              return 
    round($miles$places);
        }


    But I don't know if what I want to do CAN be done, and, if so, which way to go, and, if not, an alternative method?
    Humbly,

    Smola

  2. #2
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Yes, you can create a function and use it in your query just like built-in functions. You can also write a single query that does the distance calculation right in the query and then sort by that.

    http://www.zcentric.com/blog/2007/03..._mysql_wi.html
    http://www.petefreitag.com/item/622.cfm
    http://lists.mysql.com/mysql/204782
    http://www.synergymx.com/page.php?Ti...and_Longitude/
    http://marketingtechblog.com/technol...late-distance/

  3. #3
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    mysql has trig functions, yeah, but also spatial functions...

    http://dev.mysql.com/doc/refman/5.0/...xtensions.html
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  4. #4
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    These are AWESOME resources guys. I'm currently experimenting with some queries. Once I get the distance thing down I know I will have another issue. I have a business table and an address table and a user can have multiple addresses for that one business. So after I do my inner joins and without a GROUP BY clause I sometimes have rows that look like this (csv columns):

    BUSINESS, ZIP, DISTANCE (from some given zip)
    Biz1, 43847, 3.884
    Biz1, 43746, 6.837
    Biz1, 43865, 2.043
    Biz2, 74622, 9.382
    Biz2, 74605, 7.883

    So here's the MySQL Kung Fu I need : How do I GROUP BY BUSINESS in such a way that that row's DISTANCE field is the SMALLEST distance? Like this:

    BUSINESS, ZIP, DISTANCE (from some given zip)
    Biz1, 43865, 2.043
    Biz2, 74605, 7.883
    Humbly,

    Smola

  5. #5
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Smola View Post
    How do I GROUP BY BUSINESS in such a way that that row's DISTANCE field is the SMALLEST distance?
    it's somewhat complex, best shown with an actual query if you have one...
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  6. #6
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Alright, here's the query I've got using trig functions to get distance:

    Code:
    SELECT p.title, b.businessname, a.zip, z.Latitude, z.Longitude, ((ACOS(SIN(39.866058 * PI() / 180) * SIN(z.Latitude * PI() / 180) + COS(39.866058 * PI() / 180) * COS(z.Latitude * PI() / 180) * COS((-86.101651 - z.Longitude) * PI() / 180)) * 180 / PI()) * 60 * 1.1515) AS dist
    FROM perks p 
    INNER JOIN businesses b 
    ON p.businessid = b.businessid 
    INNER JOIN addresses a 
    ON a.businessid = p.businessid 
    INNER JOIN zips z 
    ON a.zip = z.ZipCode 
    GROUP BY p.perkid
    ORDER BY dist
    The address table has multiple addresses associated with the same businessid. Without the GROUP BY clause I can see them all, but once I include it the greatest of the distances is associated with the row as opposed to the closest. On a hunch I sorted by DESC but it still grouped with the greatest distance Here's what it looks like without GROUP BY:

    Code:
    Free Salads on Saturdays!  	Olive Garden  	46250  	39.911358  	-86.070749  	3.53653953948734
    Free Salads on Saturdays! 	Olive Garden 	46250 	39.911358 	-86.070749 	3.53653953948734
    Free Salads on Saturdays! 	Olive Garden 	46260 	39.894530 	-86.181154 	4.65677319979658
    Free Salads on Saturdays! 	Olive Garden 	46260 	39.894530 	-86.181154 	4.65677319979658
    Free Salads on Saturdays! 	Olive Garden 	46038 	39.963702 	-86.011348 	8.28035172911002
    50% off training for firedogs! 	Terriers Are Tops! 	46038 	39.963702 	-86.011348 	8.28035172911002
    lkjkj 	Jdjdkdkdkd 	46032 	39.972741 	-86.177082 	8.39410239618724
    Since the top 5 are all the same perk I need to group them for the search results, but I need the closest distance to be the one associated with the group.
    Humbly,

    Smola

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    tossing a GROUP BY into a query without actually intending to do any aggregation (e.g. SUM or COUNT) is almost always a bad move, as you have discovered, because it does not usually "pick" the right row from among repeating rows

    what you would need to do is concentrate only on the addresses, pare the query down so that it returns only addresses and their distances

    then i can show you how to pick the one with the shortest distance
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  8. #8
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    As I am mainly fontend and PHP guy I have yet to learn all the intricacies and best practices (though I've read that thread on the forum) for MySQL. Perhaps I was forming my queries wrong. As I read your post I had a thought. Once I focus on the addresses and choose the shortest distance, could that query become a subquery in selecting my "perks?" Is that where this may be heading? I would need to find the addressid that has the closest zip and then JOIN that address's businessid ON b.businessid...I think.

    My final queries currently include an IN clause with zips that fall in the radius. So I perform a calculation (via PHP) to find all zipcodes within x radius, then send that string to MySQL and use something like: WHERE a.zip IN (zip1, zip2, zip3, etc). I'm thinking that if I can perform the distance calculation in MySQL there will be no need for the PHP script to provide that list of zipcodes. I'm hoping that's possible since it sounds much more efficient!

    OK, so I changed up the query to concentrate on the address table and it looks like this:

    Code:
    SELECT a.businessid, a.addressid, a.zip, z.Latitude, z.Longitude, ( 3963.0 * acos( sin( 39.866058 / 57.2958 ) * sin( z.Latitude / 57.2958 ) + cos( 39.866058 / 57.2958 ) * cos( z.Latitude / 57.2958 ) * cos( z.Longitude / 57.2958 - - 86.101651 / 57.2958 ) ) ) AS dist
    FROM addresses a
    INNER JOIN zips z ON a.zip = z.ZipCode
    GROUP BY a.addressid
    HAVING dist <=10
    ORDER BY dist
    This will give me all addresses that fall within a given zip radius. Here is what the results from the above query are:

    Code:
    bID   aID    zip            latitude      longitude            dist
    2  	3  	46220  	39.866058  	-86.101651  	0
    2 	6 	46220 	39.866058 	-86.101651 	0
    18 	14 	46205 	39.822756 	-86.129945 	3.35086159772759
    1 	9 	46250 	39.911358 	-86.070749 	3.53653953948734
    1 	2 	46260 	39.894530 	-86.181154 	4.65677319979658
    5 	11 	46038 	39.963702 	-86.011348 	8.28035172911002
    1 	10 	46038 	39.963702 	-86.011348 	8.28035172911002
    3 	4 	46032 	39.972741 	-86.177082 	8.39410239618724
    4 	5 	46225 	39.744598 	-86.169056 	9.13269178132653
    What I need then is if any 2 address ids have the same business id (like the first two rows above), then return the addressid with the smallest distance or at least the distance itself with the businessid. A perk is associated with a business id. So when a zip is queried, if a business has multiple addresses that fall within the given radius, the distance from the closest address should be returned.

    Man, this is complicated! Am I making ANY sense? haha...
    Humbly,

    Smola

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    why does business 2 have two addresses that are a distance of 0 from wherever you're calculating to? wouldn't that make them the same address?
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Though the addresses in the database are mostly dummy addresses, this situation COULD happen. Think Starbucks. You've got one on the north side of 55th street, then another a couple of blocks down the road on the south side of 55th. Same zipcode, different street address.
    Humbly,

    Smola

  11. #11
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    One of my local banks has two branches facing each other on the same street.

  12. #12
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    the SQL looks a lot better if you can create a view...
    Code:
    CREATE VIEW dists AS
    SELECT a.businessid
         , a.addressid
         , a.zip
         , z.Latitude
         , z.Longitude
         , ( 3963.0 * acos( sin( 39.866058 / 57.2958 ) * 
            sin( z.Latitude / 57.2958 ) + 
            cos( 39.866058 / 57.2958 ) * 
            cos( z.Latitude / 57.2958 ) *
            cos( z.Longitude / 57.2958 - 
            - 86.101651 / 57.2958 ) ) ) AS dist
      FROM addresses a
    INNER 
      JOIN zips z 
        ON z.ZipCode = a.zip
    now your solution looks like this --
    Code:
    SELECT d.businessid
         , a.addressid
         , othercolumns
      FROM ( SELECT businessid
                  , MIN(dist) shortest
               FROM dists
             GROUP
                 BY businessid ) AS dmin
    INNER
      JOIN dists AS d
        ON d.businessid = dmin.businessid
       AND d.dist = dmin.shortest
    INNER
      JOIN addresses AS a
        ON a.addressid = d.addressid
    INNER
      JOIN perks
        ON ...
    the subquery finds the shortest distance per business

    join back to the dists view to get the addressid that corresponds to that distance

    then join to the addresses table to get the address details, and the other tables to get their data...

    you can do it without the view simply by substituting the distance calculation query as a subquery in both places where "dists" is referenced
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  13. #13
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If I use a view I would have to create/update that view for every query. Users will be updating the addresses on a daily basis so I can't count on that data as being current otherwise.

    So the second "view-less" solution requires two substitutions? Where your first subquery is you're saying I should replace that entire subquery with the distance calculation? Would I have the MIN() function in there as well to grab the minimum distance? Then I would use a similar or the same subquery for my JOIN right? Also, to avoid performing that calculation for all 80000 zips in my database I should be able to use my previous method of gathering zipcodes first and then use an IN() clause shouldn't I?

    Wow that was a lot of questions. I appreciate all the time you've put into this so far!
    Humbly,

    Smola

  14. #14
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Smola View Post
    If I use a view I would have to create/update that view for every query. Users will be updating the addresses on a daily basis so I can't count on that data as being current otherwise.
    A view in MySQL is just a stored query. It is not like a table, or a materialized view in Oracle. It is always up to date because using it in a query just translates to using the SELECT query that defines the view.

  15. #15
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Alright, apparently I can't edit that last message or I'd include my recent experiments. I went the "view-less" method and created this query:

    Code SQL:
    SELECT d.businessid
         , a.addressid
         , p.perkid
         , p.title
         , a.zip
         , d.dist
      FROM ( SELECT dis.businessid
                  , MIN(dist) AS shortest
               FROM (SELECT a.businessid
                          , a.addressid
                          , a.zip
                          , z.Latitude
                          , z.Longitude
                          , ( 3963.0 * acos( sin( 39.866058 / 57.2958 ) * 
                          sin( z.Latitude / 57.2958 ) + 
                          cos( 39.866058 / 57.2958 ) * 
                          cos( z.Latitude / 57.2958 ) *
                          cos( z.Longitude / 57.2958 - 
                          - 86.101651 / 57.2958 ) ) ) AS dist
                       FROM addresses a
                     INNER 
                       JOIN zips z 
                         ON z.ZipCode = a.zip) AS dis
             GROUP
               BY dis.businessid ) AS dmin
    INNER
      JOIN (SELECT a.businessid
    	         , a.addressid
    	         , a.zip
    	         , z.Latitude
    	         , z.Longitude
    	         , ( 3963.0 * acos( sin( 39.866058 / 57.2958 ) * 
    	         sin( z.Latitude / 57.2958 ) + 
    	         cos( 39.866058 / 57.2958 ) * 
    	         cos( z.Latitude / 57.2958 ) *
    	         cos( z.Longitude / 57.2958 - 
    	         - 86.101651 / 57.2958 ) ) ) AS dist
              FROM addresses a
            INNER 
              JOIN zips z 
                ON z.ZipCode = a.zip) AS d
        ON d.businessid = dmin.businessid
          AND 
           d.dist = dmin.shortest
    INNER
      JOIN addresses AS a
        ON a.addressid = d.addressid
    INNER
      JOIN perks p
        ON p.businessid = d.businessid
    GROUP
      BY p.perkid
    ORDER
      BY d.dist

    And the results appeared to be just what I was looking for:

    Code:
    2  	3  	7  	Chitl...  	46220  	0
    2 	3 	2 	$2 of... 	46220 	0
    1 	9 	4 	Free ... 	46250 	3.53653953948734
    5 	11 	10 	50% o... 	46038 	8.28035172911002
    3 	4 	18 	asdfa... 	46032 	8.39410239618724
    3 	4 	17 	asdfg... 	46032 	8.39410239618724
    3 	4 	16 	lkjkj... 	46032 	8.39410239618724
    3 	4 	14 	free ... 	46032 	8.39410239618724
    3 	4 	12 	If yo... 	46032 	8.39410239618724
    4 	5 	15 	$2000... 	46225 	9.13269178132653
    4 	5 	8 	0% AP... 	46225 	9.13269178132653
    16 	13 	13 	20% o... 	46239 	11.1350495664295
    Please excuse the horrible sample "perks." The query is quite long and when I ran it in phpMyAdmin it took .4672 seconds! Almost half a second and that's with less than 20 perks in the database! I can't imagine the time it would take with thousands of perks in the database! So I tried creating the VIEW and referencing it instead of using a subquery. It took 3.4198 seconds for the same result!! The simpler query took 7 times longer! Does that have something to do with how VIEWS are represented in MySQL? I tried it twice for good measure and both times were about 3.5 seconds. What's your take on all of this?
    Humbly,

    Smola

  16. #16
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Dan Grossman View Post
    A view in MySQL is just a stored query. It is not like a table, or a materialized view in Oracle. It is always up to date because using it in a query just translates to using the SELECT query that defines the view.
    Ahh that is an awesome feature! So as I add more rows that qualify to be in that VIEW (or would be in the result set for that query), the VIEWS rows will automatically update? That's pretty sweet!

    But see my above post about the difference in query time. Is there a decent explanation for that?
    Last edited by Smola; Mar 10, 2010 at 22:04. Reason: Expand on info...
    Humbly,

    Smola

  17. #17
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Smola View Post
    Ahh ... the VIEWS rows will automatically update? That's pretty sweet!
    that is sweet, except the view itself doesn't have any rows at all, so there's nothing to automatically update
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  18. #18
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Smola View Post
    I can't imagine the time it would take with thousands of perks in the database!
    It sounds like you don't have indexes (or the right indexes) defined on the table. Any query will be slow with more than a handful of rows without using indexes.

  19. #19
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I previously created foreign key indexes on many of the id fields in tables where foreign keys existed. I had NOT, however, created an index on the zip code table. I went ahead and created an index on the ZipCode, Latitude, and Longitude columns. I also went back and removed the individual foreign keys on many of the tables and then created one index on multiple columns that were being searched. The original EXPLAIN showed that the type was ALL for every table I was querying. Now it just says ALL for the derived tables and the perks table (p) and then there's one "eq_ref" and the rest are "index." I thought I'd see a big performance boost, but it improved maybe 1 hundredth of a second. Perhaps the difference is negligible for the size of the tables (<15 for perks, <10 businesses, <10 addresses, 40,000+ zips)?

    I tried researching derived and temporary tables and I heard conflicting opinions by people about which is better. I take it that you, r937, believe that taking the subquery route is the best way to go as opposed to a temporary table? How about you Mr. Grossman? Is it going to be the best way in the long run as the perks, businesses, and addressess tables grow?
    Last edited by Smola; Mar 12, 2010 at 19:28. Reason: added the type for other tables
    Humbly,

    Smola

  20. #20
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Smola View Post
    I take it that you, r937, believe that taking the subquery route is the best way to go as opposed to a temporary table?
    in general, yes, i do, absolutely -- assuming you mean a temporary table that you create yourself (as opposed to the temporary storage that the optimizer uses while executing your query)
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  21. #21
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I guess I mean either/or. As being somewhere between novice and intermediate, I have yet to delve into creating temporary tables of any kind. At first thought, it would seem that creating a table, populating/updating it, then querying it would take longer than a subquery within a query. But I have a feeling that that isn't the case.

    Also, I'm a bit unclear on best practices for indexes. I know they work well with columns that are searched often, but if I search a.addressid, a.businessid, and a.zip and/or use them in WHERE clauses or JOINS, is it better to create individual indexes on each or one index on all? Should I not consider fields in my SELECT for indexes? So if I want to see all of the above fields in a query but only use one in a condition, should I only pay attention to the one in the condition? For example:

    Code MySQL:
    SELECT a.addressid, a.businessid, a.zip, p.title
      FROM addresses a
    INNER
      JOIN perks p
        ON p.businessid = a.businessid

    Would it be most effecient for the above query if I have an index solely on p.businessid and a.businessid? Or one index on all fields mentioned in the select as well as a single index on p.businessid?

    Also, does it become a bad idea to create an index for a certain number of columns? So if a table has 16 columns is it a bad idea to create an index on 8 of them if they are all frequently searched at some time or another in different clusters? i.e. on one page I need 3 of the 8 in the index but on another page I need 5 of the 8.

    I must say again that I appreciate the time you are spending with me. I am gaining quite a bit of knowledge from you!
    Humbly,

    Smola

  22. #22
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Smola View Post
    Would it be most effecient for the above query if I have an index solely on p.businessid and a.businessid?
    only one index will help this query, and it will be one of the primary key indexes, on one of the ids

    why? because there is no WHERE clause, all rows from both tables will be involved in the join

    so the optimizer will table scan one of the tables, and use the index on the other's id to match those rows

    obviously, it'll choose to scan the table with the fewer rows first

    one of the things you will want to get familiar with is the EXPLAIN

    put the word EXPLAIN in front of your SELECT and see if you can figure out what the optimizer is doing by studying the output

    try it for several different queries
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  23. #23
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It's funny you mention the EXPLAIN statement because I started using it right when I started this thread! That's how I figured out the type of joins I was doing. It has been very helpful so far.

    I'm happy to say that I think I have solved my query woes thanks to you along with a couple other webpages. It's my fault for not including all information so I must apologize for that ahead of time. The zipcode table my client purchased has 16 columns that include MUCH about the zipcodes it holds. In fact, the thing is over 80000 rows large. Much of the bloat is because there are some repeating zipcodes due to a different city alias or city name or something else about that location. So 1.) I don't need those extra rows because the lats and longs are all the same with the repeated zips and 2.) I don't ever use anything other than zips, lats and longs so the other 13 columns are a bit moot. So I went ahead and created another table with just the distinct zip/lat/long columns and its just over 42000 rows. Then I created an index on all 3 of the columns because I am always querying them in conjunction with each other, changed my query with the new table name and field names, re-ran the query and it went from .4620 seconds down to .0017 seconds!!!! WOO!

    I re-ran the EXPLAIN right before and I went from
    Code MySQL:
    1  	PRIMARY  	<derived2>  	ALL  	NULL  	NULL  	NULL  	NULL  	8  	Using temporary; Using filesort
    1 	PRIMARY 	p 	ALL 	NULL 	NULL 	NULL 	NULL 	12 	Using where
    1 	PRIMARY 	<derived4> 	ALL 	NULL 	NULL 	NULL 	NULL 	16 	Using where
    1 	PRIMARY 	a 	eq_ref 	PRIMARY,aid_bid_zip_srch 	PRIMARY 	2 	d.addressid 	1 	 
    4 	DERIVED 	a 	index 	NULL 	aid_bid_zip_srch 	19 	NULL 	11 	Using index
    4 	DERIVED 	z 	index 	NULL 	zip_lat_long 	19 	NULL 	80171 	Using where; Using index
    2 	DERIVED 	<derived3> 	ALL 	NULL 	NULL 	NULL 	NULL 	16 	Using temporary; Using filesort
    3 	DERIVED 	a 	index 	NULL 	aid_bid_zip_srch 	19 	NULL 	11 	Using index
    3 	DERIVED 	z 	index 	NULL 	zip_lat_long 	19 	NULL 	80171 	Using where; Using index
    to
    Code MySQL:
    1  	PRIMARY  	<derived2>  	ALL  	NULL  	NULL  	NULL  	NULL  	8  	Using temporary; Using filesort
    1 	PRIMARY 	<derived4> 	ALL 	NULL 	NULL 	NULL 	NULL 	11 	Using where
    1 	PRIMARY 	a 	eq_ref 	PRIMARY,aid_bid_zip_srch 	PRIMARY 	2 	d.addressid 	1 	 
    1 	PRIMARY 	p 	ALL 	NULL 	NULL 	NULL 	NULL 	12 	Using where
    4 	DERIVED 	a 	index 	NULL 	aid_bid_zip_srch 	19 	NULL 	11 	Using index
    4 	DERIVED 	z 	ref 	ziplatlong 	ziplatlong 	15 	p4wdb.a.zip 	1 	Using index
    2 	DERIVED 	<derived3> 	ALL 	NULL 	NULL 	NULL 	NULL 	11 	Using temporary; Using filesort
    3 	DERIVED 	a 	index 	NULL 	aid_bid_zip_srch 	19 	NULL 	11 	Using index
    3 	DERIVED 	z 	ref 	ziplatlong 	ziplatlong 	15 	p4wdb.a.zip 	1 	Using index

    Again, I must thank you for all your time and the original query you provided. This has definitely been an enlightening experience for me!

    There's only one thing that is bothering me. I have an index on multiple columns in the perks table, including "businessid", "perkid", and "title." Yet no index is being used and the type is ALL for table p in both EXPLAINs. I can't figure it out. The index I created in the perks table was on 8 columns including the ones I mentioned above. What am I missing? Oh, and besides the 8-column index I have a PRIMARY KEY on "perkid" and a FULLTEXT index on "title" and "description".
    Humbly,

    Smola

  24. #24
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    sorry, i'm having a little trouble associating your EXPLAINs with your query

    but i think what's happening is that because of the GROUP BY perks.id, and an absence of WHERE conditions on that table, the optimizer figured it's gonna need all perks rows, so it does a table scan

    any time the optimizer figures that it will need more than x&#37; of a table (where x is some number like 60 or 80, i dunno), it calculates that the table scan is cheaper than going through the index
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  25. #25
    SitePoint Addict
    Join Date
    Sep 2005
    Posts
    335
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I know my response is a little late but I think you are approaching this incorrectly.

    Most sites that search within a certain radius. The approach is that you look at the starting zip code and create a lat/long box based on the radius required (50 miles, 25 miles, etc). Then, just pull the records that fall within the lat/long of the box. This requires only one record to have complex trig functions, uses only indexes to find the records and only returns relevant records.
    PHP Shopping Cart Software Easy Ecommerce Shopping Cart Script.
    PHP Super Cart is 100% template driven.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •