SitePoint Sponsor

User Tag List

Results 1 to 10 of 10
  1. #1
    SitePoint Zealot Hieronymus's Avatar
    Join Date
    Nov 2004
    Location
    Nederland, Eindhoven
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Red face How to select first and last created in given time frame

    Hi,

    I have a table which collects vehicle information (fms1). Most of these things count up, thus to get the 'amount' added in a period I need to subtract the last and the first message in that period.
    These messages belong to a unit (gps_unit_id) and units belong to a relation (relation_id)

    I have this query:
    Code:
    SELECT latest.gps_unit_id, latest.odometer - earliest.odometer AS diff_odometer, latest.total_fuel - earliest.total_fuel AS diff_total_fuel, latest.engine_hours - earliest.engine_hours AS diff_engine_hours, TIMEDIFF( latest.created, earliest.created ) AS diff_created
    FROM (
    
    SELECT gps_unit_id, MIN( created ) AS min_created, MAX( created ) AS max_created
    FROM fms1
    WHERE created
    BETWEEN '2012-04-01'
    AND '2012-04-30'
    AND gps_unit_id
    IN ( 76, 81, 133 )
    GROUP BY gps_unit_id
    ) AS minmax
    INNER JOIN fms1 AS earliest ON earliest.gps_unit_id = minmax.gps_unit_id
    AND earliest.created = minmax.min_created
    INNER JOIN fms1 AS latest ON latest.gps_unit_id = minmax.gps_unit_id
    AND latest.created = minmax.max_created
    WHERE minmax.gps_unit_id
    IN (
    
    SELECT id
    FROM gps_units
    WHERE relation_id =16
    AND id
    IN ( 76, 81, 133 )
    )
    If I change to a bigger time interval, let's say 2 months, or a year, the query gets very slow since every message is viewed for the min() and max(). The problem lies in the subquery:
    Code:
    SELECT gps_unit_id, MIN( created ) AS min_created, MAX( created ) AS max_created
    FROM fms1
    WHERE created
    BETWEEN '2012-04-01'
    AND '2012-04-30'
    AND gps_unit_id
    IN ( 76, 81, 133 )
    GROUP BY gps_unit_id
    The explain command says:


    id select_type table type possible_keys key key_len ref rows Extra
    1 PRIMARY <derived2> ALL NULL NULL NULL NULL 3 Using where
    1 PRIMARY earliest ref created_gps_unit_id_idx,gps_unit_id created_gps_unit_id_idx 12 minmax.min_created,minmax.gps_unit_id 1
    1 PRIMARY latest ref created_gps_unit_id_idx,gps_unit_id created_gps_unit_id_idx 12 minmax.max_created,freetrack.earliest.gps_unit_id 1 Using where
    3 DEPENDENT SUBQUERY gps_units unique_subquery PRIMARY,relation_id PRIMARY 4 func 1 Using where
    2 DERIVED fms1 range created_gps_unit_id_idx,gps_unit_id created_gps_unit_id_idx 12 NULL 224517 Using where; Using index; Using temporary; Using filesort

    the last reveals that 224517 rows are evaluated for the subquery.
    besides the query not being 'scalable' this seems like a lott of work for something rather simple.

    We can assume that messages with a higher id also are also later in time. Thus you can select the first message for a given unit next to minimum or previous to the maximum date.

    I just can't think of a way to put that in query...
    What good are one-liners if they don't
    fit.

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,322
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    please do a SHOW CREATE TABLE for the fms1 table, so that we can see which indexes it has

    please do an EXPLAIN for that subquery on its own
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Zealot Hieronymus's Avatar
    Join Date
    Nov 2004
    Location
    Nederland, Eindhoven
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Code:
    EXPLAIN SELECT gps_unit_id, MIN( created ) AS min_created, MAX( created ) AS max_created
    FROM fms1
    WHERE created
    BETWEEN '2012-04-01'
    AND '2012-04-30'
    AND gps_unit_id
    IN ( 76, 81, 133 )
    GROUP BY gps_unit_id
    Code:
    id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
    1 	SIMPLE 	fms1 	range 	created_gps_unit_id_idx,gps_unit_id 	created_gps_unit_id_idx 	12 	NULL 	224517 	Using where; Using index; Using temporary; Using filesort
    Code:
    SHOW CREATE TABLE fms1
    Code:
    CREATE TABLE `fms1` (
     `id` int(10) unsigned NOT NULL auto_increment,
     `gps_unit_id` int(10) unsigned NOT NULL,
     `odometer` double default NULL,
     `total_fuel` float default NULL,
     `engine_hours` float default NULL,
     `actual_speed` float default NULL,
     `actual_engine_speed` float default NULL,
     `actual_engine_torque` int(11) default NULL,
     `kickdown_switch` tinyint(1) default NULL,
     `accelerator_position` float default NULL,
     `brake_switch` tinyint(1) default NULL,
     `clutch_switch` tinyint(1) default NULL,
     `cruise_active` tinyint(1) default NULL,
     `pto_active` tinyint(1) default NULL,
     `fuel_level` float default NULL,
     `engine_temperature` int(11) default NULL,
     `turbo_pressure` float default NULL,
     `axle_weight_0` float default NULL,
     `axle_weight_1` float default NULL,
     `axle_weight_2` float default NULL,
     `axle_weight_3` float default NULL,
     `service_distance` int(11) default NULL,
     `created` datetime NOT NULL,
     PRIMARY KEY  (`id`),
     KEY `created_gps_unit_id_idx` (`created`,`gps_unit_id`),
     KEY `gps_unit_id` (`gps_unit_id`,`id`)
    ) ENGINE=MyISAM AUTO_INCREMENT=2673414 DEFAULT CHARSET=latin1
    What good are one-liners if they don't
    fit.

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,322
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    in your EXPLAIN, "Using index;" is very good, but "Using temporary; Using filesort;" is very bad

    i'm going to guess that if you had an index on (gps_unit_id,created), you would see a massive improvement in that query

    by the way, your condition of --
    Code:
    WHERE created BETWEEN '2012-04-01' AND '2012-04-30'
    overlooks an entire day's worth of data in april
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Zealot Hieronymus's Avatar
    Join Date
    Nov 2004
    Location
    Nederland, Eindhoven
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hey Thanks,

    that's a real speed boost! we're back in tens of milliseconds.

    I do have a rather weird problem. For some units it returns double results, even some slightly different ones. If i run the subqueries, I get just 3 results for the 3 selected id's:
    Code:
    SELECT gps_unit_id, MIN( created ) AS min_created, MAX( created ) AS max_created
    FROM fms1
    WHERE created
    BETWEEN '2012-05-01'
    AND '2012-05-30'
    AND gps_unit_id
    IN ( 90, 91, 92 )
    GROUP BY gps_unit_id
    // returns
    // min_created and max_created for 90, 91 and 92
    Code:
    SELECT id
    FROM gps_units
    WHERE relation_id =20
    AND id
    IN ( 90, 91, 92 )
    LIMIT 0 , 30
    // returns 90, 91, 92

    but, the whole query:
    Code:
    SELECT latest.gps_unit_id, latest.odometer - earliest.odometer AS diff_odometer, latest.total_fuel - earliest.total_fuel AS diff_total_fuel, latest.engine_hours - earliest.engine_hours AS diff_engine_hours, TIMEDIFF( latest.created, earliest.created ) AS diff_created
    FROM (
    
    SELECT gps_unit_id, MIN( created ) AS min_created, MAX( created ) AS max_created
    FROM fms1
    WHERE created
    BETWEEN '2012-05-01'
    AND '2012-05-30'
    AND gps_unit_id
    IN ( 90, 91, 92 )
    GROUP BY gps_unit_id
    ) AS minmax
    INNER JOIN fms1 AS earliest ON earliest.gps_unit_id = minmax.gps_unit_id
    AND earliest.created = minmax.min_created
    INNER JOIN fms1 AS latest ON latest.gps_unit_id = minmax.gps_unit_id
    AND latest.created = minmax.max_created
    WHERE minmax.gps_unit_id
    IN (
    
    SELECT id
    FROM gps_units
    WHERE relation_id =20
    AND id
    IN ( 90, 91, 92 )
    )
    returns:

    Code:
    Rows: 7
    gps_unit_id 	diff_odometer 	diff_total_fuel 	diff_engine_hours 	diff_created
    90 	7825.73000000001 	3607 	193.2001953125 	685:21:57
    91 	5513.67499999999 	2805 	154.7001953125 	681:05:40
    92 	9569.245 	4043.5 	214.75 	682:44:35
    92 	9569.19 	4043 	214.75 	682:44:35
    92 	9569.175 	4043 	214.75 	682:44:35
    92 	9569.175 	4043 	214.75 	682:44:35
    92 	9569.17 	4043 	214.75 	682:44:35
    The explain (notice the difference in rows evaluated, just 1 instead of more then 200000):

    Code:
    Rows: 5
    id 	select_type 	table 	type 	possible_keys 	key 	key_len 	ref 	rows 	Extra
    1 	PRIMARY 	<derived2> 	ALL 	NULL 	NULL 	NULL 	NULL 	3 	Using where
    1 	PRIMARY 	earliest 	ref 	created_gps_unit_id_idx,gps_unit_id,gps_unit_id_cr... 	gps_unit_id_created_idx 	12 	minmax.gps_unit_id,minmax.min_created 	1 	 
    1 	PRIMARY 	latest 	ref 	created_gps_unit_id_idx,gps_unit_id,gps_unit_id_cr... 	gps_unit_id_created_idx 	12 	freetrack.earliest.gps_unit_id,minmax.max_created 	1 	Using where
    3 	DEPENDENT SUBQUERY 	gps_units 	unique_subquery 	PRIMARY,relation_id 	PRIMARY 	4 	func 	1 	Using where
    2 	DERIVED 	fms1 	range 	created_gps_unit_id_idx,gps_unit_id,gps_unit_id_cr... 	gps_unit_id_created_idx 	12 	NULL 	1 	Using where; Using index for group-by
    this also happend before I added the index, but I just noticed it since I was playing around with date between period and different units. Any ideas why?
    What good are one-liners if they don't
    fit.

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,322
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Hieronymus View Post
    Any ideas why?
    it's gotta be your data
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    SitePoint Zealot Hieronymus's Avatar
    Join Date
    Nov 2004
    Location
    Nederland, Eindhoven
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yeah, you're right. I added "latest.created, earliest.created" to the fieldlist and offcourse noticed these are the same for the double rows. Searching on these times in de fms1 table I got 5 results for one created time.

    Is it possible to fix this in the query?By getting the lowest id for the startdate and the highest id for the stopdate?

    These messages are transmitted to my server if a unit has a GPRS connection and has the engine running. They are send every 30seconds. If there is no GPRS connection, the messages are stored until connection is made. Then they are transmitted in buld. They have no timestamp, so you don't really know to what actual time they belong.
    What good are one-liners if they don't
    fit.

  8. #8
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,322
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Hieronymus View Post
    They have no timestamp, so you don't really know to what actual time they belong.
    um...

    so using the id is a viable solution to not knowing when they were made?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  9. #9
    SitePoint Zealot Hieronymus's Avatar
    Join Date
    Nov 2004
    Location
    Nederland, Eindhoven
    Posts
    145
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yes, they are for sure added in chronological order and if multiple messages are added to the database in one time-instance (that is, the same second) then they all made before that time!
    So if there are multiple results for the stop time, we need to select the one with the highest ID
    and if there are multiple resutls for the start time, we need to select the one with the highest ID, since that would be the one closest to the actual time, all the other messages were generated before that and most likely do not belong to the selected period. Offcourse I know this is not true for all cases, but I guess it's the best choice and the easiest way to implement.

    Any ideas how to incorperate this into the query? Many thanks.
    What good are one-liners if they don't
    fit.

  10. #10
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,322
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Hieronymus View Post
    Any ideas how to incorperate this into the query?
    sorry, no
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •