Query optimalization problem

Hi,

I’ve a problem with a query that doesn’t preform well and I’ve tried a bunch of things, but can’t get it right. I’d like to know if I’m trying it wrong or that I just want the impossible.

Below I have relevant table data / indexes, my queries and the explain results.

I want to select gps points from a full day. The points are organised by segments in a track. These tracks obviously span across days and can be of ignition on or off type. Now cirtain units don’t post points if they are off, so the first point in a day can be an active (moving) point and this leads to problems with reports about start en stop times. So I want all points from a day (easy) and all points from a ignition-off track ending on that day (easy). But I can’t seem te get the together with good performance. Somehow mysql always selects all previous points…

gps_points


id
gps_track_segment_id
gps_unit_id
date

[B]indices:[/B]
primary
gps_track_segment_id
gps_unit_date_idx

gps_track_segments


id
gps_track_id
date

[b]indices[/b]
primary
gps_track_id

gps_tracks


id
hash (another unique identifier)
gps_unit_id
startdate
stopdate
ignition

[b]indices[/b]
primary
gps_unit_id
hash

Now, the queries:


EXPLAIN SELECT
 gps_points.id AS gps_point_id,
  gps_points.date,
  gps_points.ignition,
  ROUND((gps_points.lat / 1000000),6) AS lat,
  ROUND((gps_points.lng / 1000000),6) AS lng,
  UNIX_TIMESTAMP(gps_points.date) AS time
FROM gps_points
INNER JOIN gps_units ON gps_units.id = gps_points.gps_unit_id
LEFT JOIN gps_track_segments ON gps_track_segments.id = gps_points.gps_track_segment_id
LEFT JOIN gps_tracks ON gps_tracks.id = gps_track_segments.gps_track_id
WHERE (
    gps_points.date BETWEEN "2013-01-09 00:00:00" AND "2013-01-09 23:59:59"
    AND gps_points.gps_unit_id = "498"
    AND gps_units.relation_id = "93"
    )
    OR
    (gps_tracks.startdate <= "2013-01-09 00:00:00"
        AND gps_tracks.stopdate BETWEEN "2013-01-09 00:00:00" AND "2013-01-09 23:59:59"
         AND gps_tracks.ignition = 0
         AND gps_points.gps_unit_id = "498"
         AND gps_tracks.gps_unit_id = "498"
    )
    ORDER BY gps_points.date ASC

This results in 459 rows returned. 450 that are actually from th 9th and 9 that are in the ignition off track that starts before the 9th and and on the 9th. (this is a test unit, not one of the new problems.)
This query is rather slow and the explain looks like:


id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra 	
1 	SIMPLE 	gps_units 	const 	PRIMARY,relation_id 	PRIMARY 	4 	const 	1 	
1 	SIMPLE 	gps_points 	ref 	date,gps_unit_id,gps_unit_date_idx 	gps_unit_date_idx 	4 	const 	22484 	Using where
1 	SIMPLE 	gps_track_segments 	eq_ref 	PRIMARY 	PRIMARY 	4 	gps_points.gps_track_segment_id 	1 	
1 	SIMPLE 	gps_tracks 	eq_ref 	PRIMARY 	PRIMARY 	4 	gps_track_segments.gps_track_id 	1 	Using where

You can see the problem with the select for points 22484. If I just use the first part of the WHERE clause it would be just 450 and if I just use the second part of the OR in the WHERE clause the explain looks like:


id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra 	
1 	SIMPLE 	gps_units 	const 	PRIMARY 	PRIMARY 	4 	const 	1 	Using index; Using temporary; Using filesort
1 	SIMPLE 	gps_tracks 	ref 	PRIMARY,gps_unit_id 	gps_unit_id 	4 	const 	1311 	Using where
1 	SIMPLE 	gps_track_segments 	ref 	PRIMARY,gps_track_id 	gps_track_id 	4 	gps_tracks.id 	4 	Using where
1 	SIMPLE 	gps_points 	ref 	gps_track_segment_id,gps_unit_id,segment_date_idx,gps_unit_date_idx 	segment_date_idx 	4 	gps_track_segments.id 	4 	Using where

This 1311 is all tracks for that gps_unit_id. Which doens’t scale…

Any ideas would be welcome.

i did not really dig into your EXPLAIN because i immediately noticed the OR

try splitting your query into two, one for each side of the OR, and then UNION ALL the results together

also, it apeears you don’t have any multi-column indexes, that would be my next approach

Hi,

I made a slight mistake in the original post. The EXPLAIN for the second part of the OR (thus not using an OR in the WHERE) is:


id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra 	
1 	SIMPLE 	gps_units 	const 	PRIMARY 	PRIMARY 	4 	const 	1 	Using index
1 	SIMPLE 	gps_points 	ref 	gps_track_segment_id,gps_unit_id,segment_date_idx,gps_unit_date_idx 	gps_unit_date_idx 	4 	const 	22484 	Using where
1 	SIMPLE 	gps_track_segments 	eq_ref 	PRIMARY,gps_track_id 	PRIMARY 	4 	freetrack_rc.gps_points.gps_track_segment_id 	1 	Using where
1 	SIMPLE 	gps_tracks 	eq_ref 	PRIMARY,gps_unit_id,start_stop_unit_idx,startdate_ix 	PRIMARY 	4 	freetrack_rc.gps_track_segments.gps_track_id 	1 	Using where

So the the query like this:


EXPLAIN SELECT
 gps_points.id AS gps_point_id,
  gps_points.date,
  gps_points.ignition,
  ROUND((gps_points.lat / 1000000),6) AS lat,
  ROUND((gps_points.lng / 1000000),6) AS lng,
  UNIX_TIMESTAMP(gps_points.date) AS time 
FROM gps_points 
INNER JOIN gps_units ON gps_units.id = gps_points.gps_unit_id 
LEFT JOIN gps_track_segments ON gps_track_segments.id = gps_points.gps_track_segment_id 
LEFT JOIN gps_tracks ON gps_tracks.id = gps_track_segments.gps_track_id 
WHERE 
    (gps_tracks.startdate <= "2013-01-09 00:00:00" 
        AND gps_tracks.stopdate BETWEEN "2013-01-09 00:00:00" AND "2013-01-09 23:59:59"
         AND gps_tracks.ignition = 0 
         AND gps_points.gps_unit_id = "498"
         AND gps_tracks.gps_unit_id = "498" 
    ) 
    ORDER BY gps_points.date ASC

is the part that makes it very slow.

Well if I use UNION (I used the distinced version to make sure no double rows are in the result set…)

EXPLAIN SELECT
 gps_points.id AS gps_point_id,
  gps_points.date,
  gps_points.ignition,
  ROUND((gps_points.lat / 1000000),6) AS lat,
  ROUND((gps_points.lng / 1000000),6) AS lng,
  UNIX_TIMESTAMP(gps_points.date) AS time 
FROM gps_points 
INNER JOIN gps_units ON gps_units.id = gps_points.gps_unit_id 
LEFT JOIN gps_track_segments ON gps_track_segments.id = gps_points.gps_track_segment_id 
LEFT JOIN gps_tracks ON gps_tracks.id = gps_track_segments.gps_track_id 
WHERE 
    (gps_tracks.startdate <= "2013-01-09 00:00:00" 
        AND gps_tracks.stopdate BETWEEN "2013-01-09 00:00:00" AND "2013-01-09 23:59:59"
         AND gps_tracks.ignition = 0 
         AND gps_points.gps_unit_id = "498"
         AND gps_tracks.gps_unit_id = "498" 
) 
UNION
SELECT
 gps_points.id AS gps_point_id,
  gps_points.date,
  gps_points.ignition,
  ROUND((gps_points.lat / 1000000),6) AS lat,
  ROUND((gps_points.lng / 1000000),6) AS lng,
  UNIX_TIMESTAMP(gps_points.date) AS time 
FROM gps_points 
INNER JOIN gps_units ON gps_units.id = gps_points.gps_unit_id 
LEFT JOIN gps_track_segments ON gps_track_segments.id = gps_points.gps_track_segment_id 
LEFT JOIN gps_tracks ON gps_tracks.id = gps_track_segments.gps_track_id 
WHERE (
    gps_points.date BETWEEN "2013-01-09 00:00:00" AND "2013-01-09 23:59:59" 
    AND gps_points.gps_unit_id = "498" 
    AND gps_units.relation_id = "93" 
    )  

I get this result:

id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra 	
1 	PRIMARY 	gps_units 	const 	PRIMARY 	PRIMARY 	4 	const 	1 	Using index
1 	PRIMARY 	gps_tracks 	ref 	PRIMARY,gps_unit_id,start_stop_unit_idx,startdate_ix 	gps_unit_id 	4 	const 	1311 	Using where
1 	PRIMARY 	gps_track_segments 	ref 	PRIMARY,gps_track_id 	gps_track_id 	4 	freetrack_rc.gps_tracks.id 	4 	Using where
1 	PRIMARY 	gps_points 	ref 	gps_track_segment_id,gps_unit_id,segment_date_idx,gps_unit_date_idx 	segment_date_idx 	4 	freetrack_rc.gps_track_segments.id 	3 	Using where
2 	UNION 	gps_units 	const 	PRIMARY,relation_id 	PRIMARY 	4 	const 	1 	
2 	UNION 	gps_points 	range 	date,gps_unit_id,gps_unit_date_idx 	gps_unit_date_idx 	8 	NULL	450 	Using where
2 	UNION 	gps_track_segments 	eq_ref 	PRIMARY 	PRIMARY 	4 	freetrack_rc.gps_points.gps_track_segment_id 	1 	
2 	UNION 	gps_tracks 	eq_ref 	PRIMARY 	PRIMARY 	4 	freetrack_rc.gps_track_segments.gps_track_id 	1 	Using index
NULL	UNION RESULT 	<union1,2> 	ALL 	NULL	NULL	NULL	NULL	NULL	

(I also removed the ordering)
It wasn’t faster, though
It does select 1311 rows from the tracks table, which would all rows for this unit. This is not good thing, but that can be become quite a lott over time.

  • can I optimize for the amount of tracks selected?
  • can I sort this UNION?
  • Do I really need UNION DISTINC (Default for UNION) or can I use UNION ALL and have another way to prevent duplicates?