SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Evangelist hessodreamy's Avatar
    Join Date
    Apr 2005
    Location
    uk
    Posts
    524
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)

    Suggestions for indexes or query structure on this one please

    I'm trying to join 2 fairly large tables. One of which I've put a compound index on one of them, to try and speed things up. However I can't seem to get the query to utilise the compound index.

    Could someone suggest what changes I could make to the indexes or the query to make it run any faster (currently about 0.75s. I'd like it faster).
    Code:
    CREATE TABLE `product_group` (
      `prodGroupID` bigint(20) unsigned NOT NULL auto_increment,
      `productID` bigint(20) unsigned default NULL,
      `groupType` varchar(10) default NULL,
      `groupID` bigint(20) unsigned default NULL,
      PRIMARY KEY  (`prodGroupID`),
      KEY `productID` (`productID`),
      KEY `groupID` (`groupID`),
      KEY `groupType` (`groupType`),
      KEY `type_id` (`groupType`,`groupID`)
    )
    
    CREATE TABLE `sale_date_by_group` (
      `unique_key` varchar(100) NOT NULL default '',
      `groupType` varchar(10) default NULL,
      `groupID` int(11) default NULL,
      `date` date default NULL,
      `salesChannelID` double(5,1) default NULL,
      `units` int(11) default NULL,
      `turnover` double(10,2) default NULL,
      `cost` double(10,2) default NULL,
      PRIMARY KEY  (`unique_key`),
      KEY `date` (`date`),
      KEY `salesChannelID` (`salesChannelID`),
      KEY `combined_index` (`groupID`,`groupType`,`date`,`salesChannelID`)
    )
    
    SELECT pg.productID, SUM(s.units)units FROM 
    sale_date_by_group s, product_group pg 
    WHERE 
    pg.groupType='store' AND 
    pg.groupID=42 AND 
    pg.productID=s.groupID AND 
    s.groupType='product' AND 
    s.date>CURDATE() - INTERVAL 30 DAY
    GROUP BY pg.productID
    order by units desc limit 5
    
    The EXPLAIN outputs:
    id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
    1	SIMPLE	s	range	date,combined_index	date	4	NULL	463359	Using where; Using temporary; Using filesort
    1	SIMPLE	pg	ref	productID,groupID,groupType,type_id	productID	9	hsdmain.s.groupID	1	Using where
    you can see from the EXPLAIN results that the combined index on sale_date_by_group isn't being utilised.

    Any pointers on improving the indexing or the query itself to make it faster?

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,251
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    combined_index would probably be utilized if your WHERE clause provided a value for groupid, but it doesn't, so that index cannot be used

    helps?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Evangelist hessodreamy's Avatar
    Join Date
    Apr 2005
    Location
    uk
    Posts
    524
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    hmm yeah I did notice that the index is used if the groupID references a constant, but not another table. But really I want to use it in the join. Should I structure my query differently?

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,251
    Mentioned
    59 Post(s)
    Tagged
    3 Thread(s)
    if you were to rewrite your query using explicit JOIN syntax -- which you should be doing anyway, not just for this one index situation -- then it would look like this...
    Code:
    SELECT pg.productID
         , SUM(s.units) AS units 
      FROM sale_date_by_group AS s
    INNER
      JOIN product_group AS pg 
        ON pg.productID = s.groupID 
       AND pg.groupType = 'store' 
       AND pg.groupID = 42 
     WHERE s.groupType = 'product' 
       AND s.date > CURDATE() - INTERVAL 30 DAY
    GROUP 
        BY pg.productID
    ORDER 
        BY units DESC LIMIT 5
    written this way, the query suggests that you are looking for only certain sales (product groups) for only certain dates (within the last 30 days) and then summing those sales by productID but only for store group 42

    another way to write the query would be...
    Code:
    SELECT pg.productID
         , SUM(s.units) AS units 
      FROM product_group AS pg 
    INNER
      JOIN sale_date_by_group AS s
        ON s.groupID = pg.productID
       AND s.groupType = 'product' 
       AND s.date > CURDATE() - INTERVAL 30 DAY
     WHERE pg.groupType = 'store' 
       AND pg.groupID = 42 
    GROUP 
        BY pg.productID
    ORDER 
        BY units DESC LIMIT 5
    here, the query suggests that you are looking for product store group 42, then matching sales product groups for only certain dates (within the last 30 days) and then summing those sales by productID

    obviously, the optimizer is going to choose a path, regardless of which way you write this query, that minimizes the number of rows it has to look at and throw away

    did that last sentence make sense?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •