SitePoint Sponsor

User Tag List

Results 1 to 10 of 10
  1. #1
    SitePoint Zealot
    Join Date
    Sep 2004
    Location
    london
    Posts
    160
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    trimming the fat from a query

    Hi there, I am trying to trim down this query to something a little more readable.

    The query itself creates a view from a real table called 'network', only returning the latest value for each unique nic_name for a particular key (latest being defined by the datetimestamp value)



    Code:
    CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `network_view` AS select 
    `t2`.`key` AS `key`,
    `t2`.`nic_name` AS `nic_name`,
    `t2`.`nic_vp` AS `nic_vp`,
    `t2`.`nic_ipmp` AS `nic_ipmp`,
    `t2`.`nic_ip` AS `nic_ip`,
    `t2`.`nic_mac` AS `nic_mac`,
    `t2`.`nic_mask` AS `nic_mask`,
    `t2`.`nic_bcast` AS `nic_bcast`,
    `t2`.`nic_zone` AS `nic_zone`,
    `t2`.`nic_link` AS `nic_link`,
    `t2`.`nic_duplex` AS `nic_duplex`,
    `t2`.`nic_speed` AS `nic_speed`,
    `t2`.`nic_switch` AS `nic_switch`,
    `t2`.`nic_port` AS `nic_port`,
    `t2`.`nic_vlan` AS `nic_vlan`,
    `t2`.`nic_status` AS `nic_status`,
    `t2`.`datetimestamp` AS `datetimestamp` from `network` `t2` where ((`t2`.`key`,`t2`.`nic_name`,`t2`.`datetimestamp`) in (select
    `network`.`key` AS `key`,
    `network`.`nic_name` AS `nic_name`,max(`network`.`datetimestamp`) AS `MAX(datetimestamp)` 
    from `network` where ((`network`.`key` = `t2`.`key`) and (`network`.`nic_name` = `t2`.`nic_name`)) 
    group by `network`.`key`,`network`.`nic_name`) and (`t2`.`nic_status` <> 'removed')) 
    order by `t2`.`key`,`t2`.`nic_name`

    I have tried this but its doesnt seem to be working

    Code:
    CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `original` AS select t2.key
    	, t2.nic_name
    	, t2.nic_vp
    	, t2.nic_ipmp
    	, t2.nic_ip
    	, t2.nic_mac
    	, t2.nic_mask
    	, t2.nic_bcast
    	, t2.nic_zone
    	, t2.nic_link
    	, t2.nic_duplex
    	, t2.nic_speed
    	, t2.nic_switch
    	, t2.nic_port
    	, t2.nic_vlan
    	, t2.nic_status		
    	, t2.datetimestamp FROM network t2 
    	   WHERE ((t2.key,t2.nic_name,t2.datetimestamp) IN (SELECT network.key
    	   , network.nic_name
    	   , MAX(network.datetimestamp)
    	   FROM network WHERE ((network.key = t2.key)
    	   AND (network.nic_name = t2.nic_name))
    	   GROUP BY network.key,network.nic_name)
    	   AND(t2.nic_status <> 'removed'))
    	ORDER BY t2.key,t2.nic_name;
    Is there something I am missing when trying to simplify this query ?


    any help would be greatly appreciated

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    try this --
    Code:
    CREATE 
      ALGORITHM = UNDEFINED 
      DEFINER = `root`@`localhost` 
      SQL SECURITY DEFINER 
      VIEW `network_view` 
    AS 
    SELECT t2.key
         , t2.nic_name
         , t2.nic_vp
         , t2.nic_ipmp
         , t2.nic_ip
         , t2.nic_mac
         , t2.nic_mask
         , t2.nic_bcast
         , t2.nic_zone
         , t2.nic_link
         , t2.nic_duplex
         , t2.nic_speed
         , t2.nic_switch
         , t2.nic_port
         , t2.nic_vlan
         , t2.nic_status
         , t2.datetimestamp
      FROM network AS t2 
    INNER
      JOIN ( SELECT key
                  , nic_name
                  , MAX(datetimestamp) AS max_datetimestamp
               FROM network 
             GROUP 
                 BY key
                  , nic_name ) AS m
        ON m.key               = t2.key
       AND m.nic_name          = t2.nic_name
       AND m.max_datetimestamp = t2.datetimestamp
     WHERE t2.nic_status <> 'removed' 
    ORDER 
        BY t2.key
         , t2.nic_name
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Zealot
    Join Date
    Sep 2004
    Location
    london
    Posts
    160
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    ah, thanks ... thats a completely different way of doing it, id just got my head around the original one :-). I think I understand it. un fortunately im getting an error when running it

    Code:
    ERROR 1349: View's SELECT contains a subquery in the FROM clause
    Im not sure if this is just a mysql issue?


    Additionally, if i wanted to add another JOIN in there so that the results also returned a field called 'hostname' which is retrieved from a table called 'physical' (ON t2.key = physical.key), would I add that in before the other JOIN are after it?

    thanks again for helping




    Code:
    CREATE 
      ALGORITHM = UNDEFINED 
      DEFINER = `root`@`localhost` 
      SQL SECURITY DEFINER 
      VIEW `network_view` 
    AS 
    SELECT t2.key
         , t2.nic_name
         , t2.nic_vp
         , t2.nic_ipmp
         , t2.nic_ip
         , t2.nic_mac
         , t2.nic_mask
         , t2.nic_bcast
         , t2.nic_zone
         , t2.nic_link
         , t2.nic_duplex
         , t2.nic_speed
         , t2.nic_switch
         , t2.nic_port
         , t2.nic_vlan
         , t2.nic_status
         , t2.datetimestamp
      FROM network AS t2 
    INNER
      JOIN ( SELECT key
                  , nic_name
                  , MAX(datetimestamp) AS max_datetimestamp
               FROM network 
             GROUP 
                 BY key
                  , nic_name ) AS m
        ON m.key               = t2.key
       AND m.nic_name          = t2.nic_name
       AND m.max_datetimestamp = t2.datetimestamp
     WHERE t2.nic_status <> 'removed' 
    ORDER 
        BY t2.key
         , t2.nic_name

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    well, apparently you cannot create a view with a subquery in the FROM clause

    i knew this once, but forgot it

    the workaround is to create a view for the subquery and then join to that

    as for your new question, since the additional join is also an INNER JOIN, it doesn't matter if you put if before or after

    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Zealot
    Join Date
    Sep 2004
    Location
    london
    Posts
    160
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yep, I believe that mysql limitation with the sub-queries in the FROM clause was the primary reason the query (in my original post) was written the way it was (i.e. to bypass this limitation).

    as such I guess my original problem still stands, can anyone see any reason why my re-write / simplification doesn't work ?

    any help would be greatly appreciated

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by hcclnoodles View Post
    can anyone see any reason why my re-write / simplification doesn't work ?
    this is what you have --
    Code:
    SELECT ...
      FROM network t2 
     WHERE (
           ( t2.key,t2.nic_name,t2.datetimestamp ) 
           IN ( SELECT network.key
                     , network.nic_name
                     , MAX(network.datetimestamp)
                  FROM network 
                 WHERE network.key = t2.key
                   AND network.nic_name = t2.nic_name
                GROUP 
                    BY network.key
                     , network.nic_name )
       AND t2.nic_status <> 'removed'
           )
    and this is the way i would write it --
    Code:
    SELECT ...
      FROM network t2 
     WHERE t2.datetimestamp = 
           ( SELECT MAX(network.datetimestamp)
               FROM network 
              WHERE network.key = t2.key
                AND network.nic_name = t2.nic_name )
       AND t2.nic_status <> 'removed'
    although you might also have to put <>'removed' into the subquery

    and of course a correlated subquery won't perform all that well...
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    SitePoint Zealot
    Join Date
    Sep 2004
    Location
    london
    Posts
    160
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi thanks for your help. Rather than a rewriting the query as such, I really just wanted to get some help understanding what I was doing wrong with the way I was simplifying my original query. Although the original query may be a bit cludgy it works pretty well and I just wanted to make it a little more readable to those that need to understand it.

    not that i don't appreciate your help of course, which as ever is greatly appreciated :-)

  8. #8
    SitePoint Zealot
    Join Date
    Sep 2004
    Location
    london
    Posts
    160
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK , I take it back about not needing to re-engineer this. Ive trimmed down a little bit of the faff, and added in an additional JOIN. however, when I issue a `SELECT *` from this view it takes nearly 45 minutes to return (and this is on a seriously powerful box). Is there a way I can massively optimise this ?

    If i do a specific SELECT against the view (as opposed to a SELECT *) it comes back in about 1-2 seconds (which is still rubbish ). Unfortunately, SELECT * is something that certain people will need to do. 45 mins is quite a long time to wait

    Am i right in saying that it is the correlated sub-queries, that are causing this to take so long

    Code:
    CREATE ALGORITHM=UNDEFINED
     DEFINER=`root`@`localhost` 
    SQL SECURITY DEFINER 
    VIEW `MYDB`.`network_test` AS
    SELECT 
     t2.key AS key,
     t2.nic_name,
     t2.nic_vlan ,
     t2.nic_mask ,
     t2.nic_bcast ,
     t2.nic_zone ,
     t2.nic_status ,
     t2.datetimestamp ,
     all_servers.hostname
     FROM MYDB.network t2 
       JOIN MYDB.all_servers
       ON t2.key = all_servers.key
     WHERE ((t2.key,
     t2.nic_name,
     t2.datetimestamp) IN (SELECT
     MYDB.network.key AS key,
     MYDB.network.nic_name AS nic_name,
    max(MYDB.network.datetimestamp)  
     FROM MYDB.network 
     WHERE ((MYDB.network.key = t2.key) 
     AND (MYDB.network.nic_name = t2.nic_name)) 
     GROUP BY MYDB.network.key,MYDB.network.nic_name) AND (t2.nic_status <> "removed")) 
     ORDER BY t2.key,t2.nic_name;
    any advice would be great

  9. #9
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    analysis of performance problems requires
    1. complete understanding of all tables including their indexes
    2. EXPLAINs on the poorly performning queries
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  10. #10
    SitePoint Zealot
    Join Date
    Sep 2004
    Location
    london
    Posts
    160
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK, so I have trimmed down the query and this is the one I am basing the following table, index and explain information on

    Code:
    CREATE ALGORITHM=UNDEFINED 
    DEFINER=`root`@`localhost` 
    SQL SECURITY DEFINER VIEW `MYDB`.`network_test` AS
    SELECT 
     t2.key AS key,
     t2.nic_name,
     t2.nic_vlan ,
     t2.nic_status ,
     t2.datetimestamp 
     FROM MYDB.network t2 
     WHERE ((t2.key,
     t2.nic_name,
     t2.datetimestamp) IN (SELECT
     MYDB.network.key AS key,
     MYDB.network.nic_name AS nic_name,
    max(MYDB.network.datetimestamp)  
     FROM MYDB.network 
     WHERE ((MYDB.network.key = t2.key) 
     AND (MYDB.network.nic_name = t2.nic_name)) 
     GROUP BY MYDB.network.key,MYDB.network.nic_name) AND (t2.nic_status <> "removed")) 
     ORDER BY t2.key,t2.nic_name;
    The (newly trimmed) 'network' table looks like this

    Code:
    NETWORK TABLE
    
    |* key *|* nic_name *|* nic_vlan *|*nic_status *|* datetimestamp *|
    | 1112 | e1000g0 | 23 | online | 2010-05-16 04:27:38 |
    | 1112 | e1000g1 | 21 | online | 2010-05-16 04:28:19 |
    | 1112 | e1000g0 | 23 | online | 2010-05-15 01:26:39 |
    | 1675 | nge0 | 20 | online | 2010-05-16 02:27:38 |
    | 1675 | nge0 | 20 | online | 2010-05-14 01:26:68 |
    
    + Contains around 6,000 records (of which 2,500 are historical and 3,500 are current)

    As you can see from above, The table contains records for each 'key' (a 'key' represents a physical computer). Where each unique nic_name (network interface) on a particular 'key' is defined with its current (defined by latest timestamp) and historical records (older timestamps) .

    The NETWORK table has the following indexes (there is no unique/primary key on this table as it contains history)
    Code:
    	Keyname	Type	Unique	Packed	Field	Cardinality	Collation	Null	Comment
    	primac	BTREE	No	No	primac	1095	A		
     	nic_ip	BTREE	No	No	nic_ip	2669	A		
    	nic_name	BTREE	No	No	nic_name	200	A		
     	datetimestamp	BTREE	No	No	datetimestamp	42705	A


    The query to create the "network_test" VIEW (the one posted above) is designed to retrieve the 'most recent' record for each 'unique nic_name' on a particular 'key'. So the resulting output should (and does) look like this

    Code:
    |* key *|* nic_name *|* nic_vlan *|*nic_status *|* datetimestamp *|
    | 1112 | e1000g0 | 23 | online | 2010-05-16 04:27:38 |
    | 1112 | e1000g1 | 21 | online | 2010-05-16 04:28:19 |
    | 1675 | nge0 | 20 | online | 2010-05-16 02:27:38 |
    If i issue an EXPLAIN on this I get the following

    Code:
    mysql> explain select * from network_test;
    +----+--------------------+---------+------+-----------------+--------+---------+----------------+-------+----------------------------------------------+
    | id | select_type        | table   | type | possible_keys   | key    | key_len | ref            | rows  | Extra                                        |
    +----+--------------------+---------+------+-----------------+--------+---------+----------------+-------+----------------------------------------------+
    |  1 | PRIMARY            | t2      | ALL  | NULL            | NULL   | NULL    | NULL           | 85411 | Using where; Using filesort                  | 
    |  3 | DEPENDENT SUBQUERY | network | ref  | key,nic_name | key | 32      | MYDB.t2.key |    78 | Using where; Using temporary; Using filesort | 
    +----+--------------------+---------+------+-----------------+--------+---------+----------------+-------+----------------------------------------------+
    2 rows in set (0.00 sec)
    Im assumin the NULL in the 'possible keys' at the t2 stage is a big problem, I assume there isnt a way of indexing a temp table during its own creation ?

    The "SELECT * FROM network_test" query takes just under 10 mins to run
    Code:
    3285 rows in set (9 min 40.72 sec)


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •