CREATE FUNCTION vs CALL PROCEDURE

Smola · March 8, 2010, 2:51am

I have been working with some pagination for a client and I have run into a sorting issue. I want the pagination to work by making sure that the sorting happens with my database query. I do not want to resort to storing every row in an array and then sorting that if I don’t have to.

So I can sort by a few fields but one sort choice in particular I’m having trouble with is by location. I have a separate zipcode database which I use to calculate distances via a php script in other areas of the site. However, for pagination, I would need this calculation to happen as the database is getting results. So I began to research ways of creating custom functions. I came across the two mentioned in the summary. I think that the custom FUNCTION is something I could call in line with a query and make it a field in the result set. The PROCEDURE seems to allow multiple lines/commands but doesn’t appear as if I could use it with a query. I mean, I know I could make a query within the procedure, but I don’t think I could make something a field.

Anyway, I wanted to calculate distances on the fly in a query. so, creating a function and calling it like :

SELECT t.field1, t.field2, DISTANCE(givenzip, t.field2) as dist FROM table1 t ORDER BY dist

I’m pretty sure that MySQL has built-in trig functions to allow me to port my php script (modified of course) into a custom function:


function DistanceCalc($lat1, $lon1, $lat2, $lon2, $unit = "M", $places = 5) { 

	$theta = $lon1 - $lon2; 
	//spherical law of cosines
	$dist = sin(deg2rad($lat1)) * sin(deg2rad($lat2)) +  cos(deg2rad($lat1)) * cos(deg2rad($lat2)) * cos(deg2rad($theta)); 
	$dist = acos($dist); 
	$dist = rad2deg($dist);
	 
	$miles = $dist * 60 * 1.1515;
	$unit = strtoupper($unit);
	
	//can return in kilometers or nautical miles?
	if ($unit == "K") {
		return round($miles * 1.609344, $places); 
	} else if ($unit == "N") {
	  	return round($miles * 0.8684, $places);
	} else {
	  	return round($miles, $places);
	}

}

But I don’t know if what I want to do CAN be done, and, if so, which way to go, and, if not, an alternative method?

Smola · March 11, 2010, 3:59am

Alright, apparently I can’t edit that last message or I’d include my recent experiments. I went the “view-less” method and created this query:


SELECT d.businessid
     , a.addressid
     , p.perkid
     , p.title
     , a.zip
     , d.dist
  FROM ( SELECT dis.businessid
              , MIN(dist) AS shortest
           FROM (SELECT a.businessid
                      , a.addressid
                      , a.zip
                      , z.Latitude
                      , z.Longitude
                      , ( 3963.0 * acos( sin( 39.866058 / 57.2958 ) * 
                      sin( z.Latitude / 57.2958 ) + 
                      cos( 39.866058 / 57.2958 ) * 
                      cos( z.Latitude / 57.2958 ) *
                      cos( z.Longitude / 57.2958 - 
                      - 86.101651 / 57.2958 ) ) ) AS dist
                   FROM addresses a
                 INNER 
                   JOIN zips z 
                     ON z.ZipCode = a.zip) AS dis
         GROUP
           BY dis.businessid ) AS dmin
INNER
  JOIN (SELECT a.businessid
	         , a.addressid
	         , a.zip
	         , z.Latitude
	         , z.Longitude
	         , ( 3963.0 * acos( sin( 39.866058 / 57.2958 ) * 
	         sin( z.Latitude / 57.2958 ) + 
	         cos( 39.866058 / 57.2958 ) * 
	         cos( z.Latitude / 57.2958 ) *
	         cos( z.Longitude / 57.2958 - 
	         - 86.101651 / 57.2958 ) ) ) AS dist
          FROM addresses a
        INNER 
          JOIN zips z 
            ON z.ZipCode = a.zip) AS d
    ON d.businessid = dmin.businessid
      AND 
       d.dist = dmin.shortest
INNER
  JOIN addresses AS a
    ON a.addressid = d.addressid
INNER
  JOIN perks p
    ON p.businessid = d.businessid
GROUP
  BY p.perkid
ORDER
  BY d.dist

And the results appeared to be just what I was looking for:


2  	3  	7  	Chitl...  	46220  	0
2 	3 	2 	$2 of... 	46220 	0
1 	9 	4 	Free ... 	46250 	3.53653953948734
5 	11 	10 	50% o... 	46038 	8.28035172911002
3 	4 	18 	asdfa... 	46032 	8.39410239618724
3 	4 	17 	asdfg... 	46032 	8.39410239618724
3 	4 	16 	lkjkj... 	46032 	8.39410239618724
3 	4 	14 	free ... 	46032 	8.39410239618724
3 	4 	12 	If yo... 	46032 	8.39410239618724
4 	5 	15 	$2000... 	46225 	9.13269178132653
4 	5 	8 	0% AP... 	46225 	9.13269178132653
16 	13 	13 	20% o... 	46239 	11.1350495664295

Please excuse the horrible sample “perks.” The query is quite long and when I ran it in phpMyAdmin it took .4672 seconds! Almost half a second and that’s with less than 20 perks in the database! I can’t imagine the time it would take with thousands of perks in the database! So I tried creating the VIEW and referencing it instead of using a subquery. It took 3.4198 seconds for the same result!! The simpler query took 7 times longer! Does that have something to do with how VIEWS are represented in MySQL? I tried it twice for good measure and both times were about 3.5 seconds. What’s your take on all of this?

Smola · March 11, 2010, 4:00am

Ahh that is an awesome feature! So as I add more rows that qualify to be in that VIEW (or would be in the result set for that query), the VIEWS rows will automatically update? That’s pretty sweet!

But see my above post about the difference in query time. Is there a decent explanation for that? :x

Dan_Grossman · March 11, 2010, 3:40am

A view in MySQL is just a stored query. It is not like a table, or a materialized view in Oracle. It is always up to date because using it in a query just translates to using the SELECT query that defines the view.

Smola · March 11, 2010, 2:58am

If I use a view I would have to create/update that view for every query. Users will be updating the addresses on a daily basis so I can’t count on that data as being current otherwise.

So the second “view-less” solution requires two substitutions? Where your first subquery is you’re saying I should replace that entire subquery with the distance calculation? Would I have the MIN() function in there as well to grab the minimum distance? Then I would use a similar or the same subquery for my JOIN right? Also, to avoid performing that calculation for all 80000 zips in my database I should be able to use my previous method of gathering zipcodes first and then use an IN() clause shouldn’t I?

Wow that was a lot of questions. I appreciate all the time you’ve put into this so far!

r937 · March 11, 2010, 1:36pm

that is sweet, except the view itself doesn’t have any rows at all, so there’s nothing to automatically update

Dan_Grossman · March 11, 2010, 1:38pm

It sounds like you don’t have indexes (or the right indexes) defined on the table. Any query will be slow with more than a handful of rows without using indexes.

Dan_Grossman · March 8, 2010, 2:55am

Yes, you can create a function and use it in your query just like built-in functions. You can also write a single query that does the distance calculation right in the query and then sort by that.

http://www.zcentric.com/blog/2007/03/calculate_distance_in_mysql_wi.html

http://lists.mysql.com/mysql/204782
http://www.synergymx.com/page.php?Title=Determining_Distance_With_Latitude_and_Longitude/
http://marketingtechblog.com/technology/calculate-distance/

r937 · March 8, 2010, 2:58am

mysql has trig functions, yeah, but also spatial functions…

http://dev.mysql.com/doc/refman/5.0/en/spatial-extensions.html

Smola · March 8, 2010, 4:21am

These are AWESOME resources guys. I’m currently experimenting with some queries. Once I get the distance thing down I know I will have another issue. I have a business table and an address table and a user can have multiple addresses for that one business. So after I do my inner joins and without a GROUP BY clause I sometimes have rows that look like this (csv columns):

BUSINESS, ZIP, DISTANCE (from some given zip)
Biz1, 43847, 3.884
Biz1, 43746, 6.837
Biz1, 43865, 2.043
Biz2, 74622, 9.382
Biz2, 74605, 7.883

So here’s the MySQL Kung Fu I need How do I GROUP BY BUSINESS in such a way that that row’s DISTANCE field is the SMALLEST distance? Like this:

BUSINESS, ZIP, DISTANCE (from some given zip)
Biz1, 43865, 2.043
Biz2, 74605, 7.883

r937 · March 8, 2010, 7:17am

it’s somewhat complex, best shown with an actual query if you have one…

Smola · March 9, 2010, 1:13am

Alright, here’s the query I’ve got using trig functions to get distance:


SELECT p.title, b.businessname, a.zip, z.Latitude, z.Longitude, ((ACOS(SIN(39.866058 * PI() / 180) * SIN(z.Latitude * PI() / 180) + COS(39.866058 * PI() / 180) * COS(z.Latitude * PI() / 180) * COS((-86.101651 - z.Longitude) * PI() / 180)) * 180 / PI()) * 60 * 1.1515) AS dist
FROM perks p 
INNER JOIN businesses b 
ON p.businessid = b.businessid 
INNER JOIN addresses a 
ON a.businessid = p.businessid 
INNER JOIN zips z 
ON a.zip = z.ZipCode 
GROUP BY p.perkid
ORDER BY dist

The address table has multiple addresses associated with the same businessid. Without the GROUP BY clause I can see them all, but once I include it the greatest of the distances is associated with the row as opposed to the closest. On a hunch I sorted by DESC but it still grouped with the greatest distance Here’s what it looks like without GROUP BY:


Free Salads on Saturdays!  	Olive Garden  	46250  	39.911358  	-86.070749  	3.53653953948734
Free Salads on Saturdays! 	Olive Garden 	46250 	39.911358 	-86.070749 	3.53653953948734
Free Salads on Saturdays! 	Olive Garden 	46260 	39.894530 	-86.181154 	4.65677319979658
Free Salads on Saturdays! 	Olive Garden 	46260 	39.894530 	-86.181154 	4.65677319979658
Free Salads on Saturdays! 	Olive Garden 	46038 	39.963702 	-86.011348 	8.28035172911002
50% off training for firedogs! 	Terriers Are Tops! 	46038 	39.963702 	-86.011348 	8.28035172911002
lkjkj 	Jdjdkdkdkd 	46032 	39.972741 	-86.177082 	8.39410239618724

Since the top 5 are all the same perk I need to group them for the search results, but I need the closest distance to be the one associated with the group.

r937 · March 9, 2010, 3:18am

tossing a GROUP BY into a query without actually intending to do any aggregation (e.g. SUM or COUNT) is almost always a bad move, as you have discovered, because it does not usually “pick” the right row from among repeating rows

what you would need to do is concentrate only on the addresses, pare the query down so that it returns only addresses and their distances

then i can show you how to pick the one with the shortest distance

Smola · March 9, 2010, 4:02am

As I am mainly fontend and PHP guy I have yet to learn all the intricacies and best practices (though I’ve read that thread on the forum) for MySQL. Perhaps I was forming my queries wrong. As I read your post I had a thought. Once I focus on the addresses and choose the shortest distance, could that query become a subquery in selecting my “perks?” Is that where this may be heading? I would need to find the addressid that has the closest zip and then JOIN that address’s businessid ON b.businessid…I think.

My final queries currently include an IN clause with zips that fall in the radius. So I perform a calculation (via PHP) to find all zipcodes within x radius, then send that string to MySQL and use something like: WHERE a.zip IN (zip1, zip2, zip3, etc). I’m thinking that if I can perform the distance calculation in MySQL there will be no need for the PHP script to provide that list of zipcodes. I’m hoping that’s possible since it sounds much more efficient!

OK, so I changed up the query to concentrate on the address table and it looks like this:


SELECT a.businessid, a.addressid, a.zip, z.Latitude, z.Longitude, ( 3963.0 * acos( sin( 39.866058 / 57.2958 ) * sin( z.Latitude / 57.2958 ) + cos( 39.866058 / 57.2958 ) * cos( z.Latitude / 57.2958 ) * cos( z.Longitude / 57.2958 - - 86.101651 / 57.2958 ) ) ) AS dist
FROM addresses a
INNER JOIN zips z ON a.zip = z.ZipCode
GROUP BY a.addressid
HAVING dist <=10
ORDER BY dist

This will give me all addresses that fall within a given zip radius. Here is what the results from the above query are:


[B]bID   aID    zip            latitude      longitude            dist[/B]
2  	3  	46220  	39.866058  	-86.101651  	0
2 	6 	46220 	39.866058 	-86.101651 	0
18 	14 	46205 	39.822756 	-86.129945 	3.35086159772759
1 	9 	46250 	39.911358 	-86.070749 	3.53653953948734
1 	2 	46260 	39.894530 	-86.181154 	4.65677319979658
5 	11 	46038 	39.963702 	-86.011348 	8.28035172911002
1 	10 	46038 	39.963702 	-86.011348 	8.28035172911002
3 	4 	46032 	39.972741 	-86.177082 	8.39410239618724
4 	5 	46225 	39.744598 	-86.169056 	9.13269178132653

What I need then is if any 2 address ids have the same business id (like the first two rows above), then return the addressid with the smallest distance or at least the distance itself with the businessid. A perk is associated with a business id. So when a zip is queried, if a business has multiple addresses that fall within the given radius, the distance from the closest address should be returned.

Man, this is complicated! Am I making ANY sense? haha…

r937 · March 9, 2010, 4:08am

why does business 2 have two addresses that are a distance of 0 from wherever you’re calculating to? wouldn’t that make them the same address?

Smola · March 9, 2010, 4:34am

Though the addresses in the database are mostly dummy addresses, this situation COULD happen. Think Starbucks. You’ve got one on the north side of 55th street, then another a couple of blocks down the road on the south side of 55th. Same zipcode, different street address.

Dan_Grossman · March 9, 2010, 4:57am

One of my local banks has two branches facing each other on the same street.

r937 · March 9, 2010, 10:26am

the SQL looks a lot better if you can create a view…

CREATE VIEW dists AS
SELECT a.businessid
     , a.addressid
     , a.zip
     , z.Latitude
     , z.Longitude
     , ( 3963.0 * acos( sin( 39.866058 / 57.2958 ) * 
        sin( z.Latitude / 57.2958 ) + 
        cos( 39.866058 / 57.2958 ) * 
        cos( z.Latitude / 57.2958 ) *
        cos( z.Longitude / 57.2958 - 
        - 86.101651 / 57.2958 ) ) ) AS dist
  FROM addresses a
INNER 
  JOIN zips z 
    ON z.ZipCode = a.zip

now your solution looks like this –

SELECT d.businessid
     , a.addressid
     , [I]othercolumns[/I]
  FROM ( SELECT businessid
              , MIN(dist) shortest
           FROM dists
         GROUP
             BY businessid ) AS dmin
INNER
  JOIN dists AS d
    ON d.businessid = dmin.businessid
   AND d.dist = dmin.shortest
INNER
  JOIN addresses AS a
    ON a.addressid = d.addressid
INNER
  JOIN perks
    ON ...

the subquery finds the shortest distance per business

join back to the dists view to get the addressid that corresponds to that distance

then join to the addresses table to get the address details, and the other tables to get their data…

you can do it without the view simply by substituting the distance calculation query as a subquery in both places where “dists” is referenced

Smola · March 13, 2010, 1:22am

I previously created foreign key indexes on many of the id fields in tables where foreign keys existed. I had NOT, however, created an index on the zip code table. I went ahead and created an index on the ZipCode, Latitude, and Longitude columns. I also went back and removed the individual foreign keys on many of the tables and then created one index on multiple columns that were being searched. The original EXPLAIN showed that the type was ALL for every table I was querying. Now it just says ALL for the derived tables and the perks table (p) and then there’s one “eq_ref” and the rest are “index.” I thought I’d see a big performance boost, but it improved maybe 1 hundredth of a second. Perhaps the difference is negligible for the size of the tables (<15 for perks, <10 businesses, <10 addresses, 40,000+ zips)?

I tried researching derived and temporary tables and I heard conflicting opinions by people about which is better. I take it that you, r937, believe that taking the subquery route is the best way to go as opposed to a temporary table? How about you Mr. Grossman? Is it going to be the best way in the long run as the perks, businesses, and addressess tables grow?

r937 · March 13, 2010, 1:28am

in general, yes, i do, absolutely – assuming you mean a temporary table that you create yourself (as opposed to the temporary storage that the optimizer uses while executing your query)