SitePoint Sponsor

User Tag List

Results 1 to 3 of 3
  1. #1
    SitePoint Zealot Hieronymus's Avatar
    Join Date
    Nov 2004
    Location
    Nederland, Eindhoven
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    query optimalization problem

    Hi,

    I've a problem with a query that doesn't preform well and I've tried a bunch of things, but can't get it right. I'd like to know if I'm trying it wrong or that I just want the impossible.

    Below I have relevant table data / indexes, my queries and the explain results.

    I want to select gps points from a full day. The points are organised by segments in a track. These tracks obviously span across days and can be of ignition on or off type. Now cirtain units don't post points if they are off, so the first point in a day can be an active (moving) point and this leads to problems with reports about start en stop times. So I want all points from a day (easy) and all points from a ignition-off track ending on that day (easy). But I can't seem te get the together with good performance. Somehow mysql always selects all previous points...

    gps_points
    Code:
    id
    gps_track_segment_id
    gps_unit_id
    date
    
    indices:
    primary
    gps_track_segment_id
    gps_unit_date_idx
    gps_track_segments
    Code:
    id
    gps_track_id
    date
    
    indices
    primary
    gps_track_id
    gps_tracks
    Code:
    id
    hash (another unique identifier)
    gps_unit_id
    startdate
    stopdate
    ignition
    
    indices
    primary
    gps_unit_id
    hash
    Now, the queries:
    Code:
    EXPLAIN SELECT
     gps_points.id AS gps_point_id,
      gps_points.date,
      gps_points.ignition,
      ROUND((gps_points.lat / 1000000),6) AS lat,
      ROUND((gps_points.lng / 1000000),6) AS lng,
      UNIX_TIMESTAMP(gps_points.date) AS time 
    FROM gps_points 
    INNER JOIN gps_units ON gps_units.id = gps_points.gps_unit_id 
    LEFT JOIN gps_track_segments ON gps_track_segments.id = gps_points.gps_track_segment_id 
    LEFT JOIN gps_tracks ON gps_tracks.id = gps_track_segments.gps_track_id 
    WHERE (
        gps_points.date BETWEEN "2013-01-09 00:00:00" AND "2013-01-09 23:59:59" 
        AND gps_points.gps_unit_id = "498" 
        AND gps_units.relation_id = "93" 
        ) 
        OR 
        (gps_tracks.startdate <= "2013-01-09 00:00:00" 
            AND gps_tracks.stopdate BETWEEN "2013-01-09 00:00:00" AND "2013-01-09 23:59:59"
             AND gps_tracks.ignition = 0 
             AND gps_points.gps_unit_id = "498"
             AND gps_tracks.gps_unit_id = "498" 
        ) 
        ORDER BY gps_points.date ASC
    This results in 459 rows returned. 450 that are actually from th 9th and 9 that are in the ignition off track that starts before the 9th and and on the 9th. (this is a test unit, not one of the new problems.)
    This query is rather slow and the explain looks like:

    Code:
    id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra 	
    1 	SIMPLE 	gps_units 	const 	PRIMARY,relation_id 	PRIMARY 	4 	const 	1 	
    1 	SIMPLE 	gps_points 	ref 	date,gps_unit_id,gps_unit_date_idx 	gps_unit_date_idx 	4 	const 	22484 	Using where
    1 	SIMPLE 	gps_track_segments 	eq_ref 	PRIMARY 	PRIMARY 	4 	gps_points.gps_track_segment_id 	1 	
    1 	SIMPLE 	gps_tracks 	eq_ref 	PRIMARY 	PRIMARY 	4 	gps_track_segments.gps_track_id 	1 	Using where
    You can see the problem with the select for points 22484. If I just use the first part of the WHERE clause it would be just 450 and if I just use the second part of the OR in the WHERE clause the explain looks like:

    Code:
    id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra 	
    1 	SIMPLE 	gps_units 	const 	PRIMARY 	PRIMARY 	4 	const 	1 	Using index; Using temporary; Using filesort
    1 	SIMPLE 	gps_tracks 	ref 	PRIMARY,gps_unit_id 	gps_unit_id 	4 	const 	1311 	Using where
    1 	SIMPLE 	gps_track_segments 	ref 	PRIMARY,gps_track_id 	gps_track_id 	4 	gps_tracks.id 	4 	Using where
    1 	SIMPLE 	gps_points 	ref 	gps_track_segment_id,gps_unit_id,segment_date_idx,gps_unit_date_idx 	segment_date_idx 	4 	gps_track_segments.id 	4 	Using where
    This 1311 is all tracks for that gps_unit_id. Which doens't scale...

    Any ideas would be welcome.
    What good are one-liners if they don't
    fit.

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,021
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    i did not really dig into your EXPLAIN because i immediately noticed the OR

    try splitting your query into two, one for each side of the OR, and then UNION ALL the results together

    also, it apeears you don't have any multi-column indexes, that would be my next approach
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Zealot Hieronymus's Avatar
    Join Date
    Nov 2004
    Location
    Nederland, Eindhoven
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    I made a slight mistake in the original post. The EXPLAIN for the second part of the OR (thus not using an OR in the WHERE) is:

    Code:
    id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra 	
    1 	SIMPLE 	gps_units 	const 	PRIMARY 	PRIMARY 	4 	const 	1 	Using index
    1 	SIMPLE 	gps_points 	ref 	gps_track_segment_id,gps_unit_id,segment_date_idx,gps_unit_date_idx 	gps_unit_date_idx 	4 	const 	22484 	Using where
    1 	SIMPLE 	gps_track_segments 	eq_ref 	PRIMARY,gps_track_id 	PRIMARY 	4 	freetrack_rc.gps_points.gps_track_segment_id 	1 	Using where
    1 	SIMPLE 	gps_tracks 	eq_ref 	PRIMARY,gps_unit_id,start_stop_unit_idx,startdate_ix 	PRIMARY 	4 	freetrack_rc.gps_track_segments.gps_track_id 	1 	Using where
    So the the query like this:
    Code:
    EXPLAIN SELECT
     gps_points.id AS gps_point_id,
      gps_points.date,
      gps_points.ignition,
      ROUND((gps_points.lat / 1000000),6) AS lat,
      ROUND((gps_points.lng / 1000000),6) AS lng,
      UNIX_TIMESTAMP(gps_points.date) AS time 
    FROM gps_points 
    INNER JOIN gps_units ON gps_units.id = gps_points.gps_unit_id 
    LEFT JOIN gps_track_segments ON gps_track_segments.id = gps_points.gps_track_segment_id 
    LEFT JOIN gps_tracks ON gps_tracks.id = gps_track_segments.gps_track_id 
    WHERE 
        (gps_tracks.startdate <= "2013-01-09 00:00:00" 
            AND gps_tracks.stopdate BETWEEN "2013-01-09 00:00:00" AND "2013-01-09 23:59:59"
             AND gps_tracks.ignition = 0 
             AND gps_points.gps_unit_id = "498"
             AND gps_tracks.gps_unit_id = "498" 
        ) 
        ORDER BY gps_points.date ASC
    is the part that makes it very slow.

    Well if I use UNION (I used the distinced version to make sure no double rows are in the result set...)

    Code:
    EXPLAIN SELECT
     gps_points.id AS gps_point_id,
      gps_points.date,
      gps_points.ignition,
      ROUND((gps_points.lat / 1000000),6) AS lat,
      ROUND((gps_points.lng / 1000000),6) AS lng,
      UNIX_TIMESTAMP(gps_points.date) AS time 
    FROM gps_points 
    INNER JOIN gps_units ON gps_units.id = gps_points.gps_unit_id 
    LEFT JOIN gps_track_segments ON gps_track_segments.id = gps_points.gps_track_segment_id 
    LEFT JOIN gps_tracks ON gps_tracks.id = gps_track_segments.gps_track_id 
    WHERE 
        (gps_tracks.startdate <= "2013-01-09 00:00:00" 
            AND gps_tracks.stopdate BETWEEN "2013-01-09 00:00:00" AND "2013-01-09 23:59:59"
             AND gps_tracks.ignition = 0 
             AND gps_points.gps_unit_id = "498"
             AND gps_tracks.gps_unit_id = "498" 
    ) 
    UNION
    SELECT
     gps_points.id AS gps_point_id,
      gps_points.date,
      gps_points.ignition,
      ROUND((gps_points.lat / 1000000),6) AS lat,
      ROUND((gps_points.lng / 1000000),6) AS lng,
      UNIX_TIMESTAMP(gps_points.date) AS time 
    FROM gps_points 
    INNER JOIN gps_units ON gps_units.id = gps_points.gps_unit_id 
    LEFT JOIN gps_track_segments ON gps_track_segments.id = gps_points.gps_track_segment_id 
    LEFT JOIN gps_tracks ON gps_tracks.id = gps_track_segments.gps_track_id 
    WHERE (
        gps_points.date BETWEEN "2013-01-09 00:00:00" AND "2013-01-09 23:59:59" 
        AND gps_points.gps_unit_id = "498" 
        AND gps_units.relation_id = "93" 
        )
    I get this result:

    Code:
    id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra 	
    1 	PRIMARY 	gps_units 	const 	PRIMARY 	PRIMARY 	4 	const 	1 	Using index
    1 	PRIMARY 	gps_tracks 	ref 	PRIMARY,gps_unit_id,start_stop_unit_idx,startdate_ix 	gps_unit_id 	4 	const 	1311 	Using where
    1 	PRIMARY 	gps_track_segments 	ref 	PRIMARY,gps_track_id 	gps_track_id 	4 	freetrack_rc.gps_tracks.id 	4 	Using where
    1 	PRIMARY 	gps_points 	ref 	gps_track_segment_id,gps_unit_id,segment_date_idx,gps_unit_date_idx 	segment_date_idx 	4 	freetrack_rc.gps_track_segments.id 	3 	Using where
    2 	UNION 	gps_units 	const 	PRIMARY,relation_id 	PRIMARY 	4 	const 	1 	
    2 	UNION 	gps_points 	range 	date,gps_unit_id,gps_unit_date_idx 	gps_unit_date_idx 	8 	NULL	450 	Using where
    2 	UNION 	gps_track_segments 	eq_ref 	PRIMARY 	PRIMARY 	4 	freetrack_rc.gps_points.gps_track_segment_id 	1 	
    2 	UNION 	gps_tracks 	eq_ref 	PRIMARY 	PRIMARY 	4 	freetrack_rc.gps_track_segments.gps_track_id 	1 	Using index
    NULL	UNION RESULT 	<union1,2> 	ALL 	NULL	NULL	NULL	NULL	NULL
    (I also removed the ordering)
    It wasn't faster, though
    It does select 1311 rows from the tracks table, which would all rows for this unit. This is not good thing, but that can be become quite a lott over time.

    * can I optimize for the amount of tracks selected?
    * can I sort this UNION?
    * Do I really need UNION DISTINC (Default for UNION) or can I use UNION ALL and have another way to prevent duplicates?
    What good are one-liners if they don't
    fit.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •