SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Zealot
    Join Date
    Jan 2002
    Posts
    111
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Optimizing Tag Select/Order By

    I am trying to optimize a query for selecting all the feeds related to a particular tag(tagid) then ordering them by Score. My tag table is around 100 million rows and my feed table is around 3 million rows.

    This is what i have:
    Code:
    select feeds.FeedID, feeds.FeedName, feeds.Score 
    from feeds        
    inner join tags on feeds.FeedID = tags.FeedID
    where tags.TagDirID = '123456' 
    order by feeds.Score desc 
    limit 500
    However, when i do an explain on it i am getting this:
    Code:
    id     select_type     table     type     possible_keys     key     key_len     ref     rows     Extra
    1     SIMPLE     tags     ref     FeedID,TagDirID     TagDirID     5     const     228120     Using where; Using temporary; Using filesort
    1     SIMPLE     feeds     eq_ref     PRIMARY     PRIMARY     4     FeedID     1
    When I remove "order by Score" i lose the filesort and the query is very quick(<1 second), with it the query is around 45 seconds. I have an index on feeds.Score but it doesn't appear to be being used when I try to sort on it. Any ideas how to speed this up? I read some stuff about leading tables and indexes but i couldn't figure out how to fix it.

    Here's what the table structure is like:
    Code:
    CREATE TABLE `feeds` (
      `FeedID` int(8) NOT NULL AUTO_INCREMENT,
      `FeedName` varchar(100) DEFAULT NULL,
      `Score` decimal(3,1) DEFAULT NULL,
    
      PRIMARY KEY (`FeedID`),
      KEY `Score` (`Score`)
    )
    
    CREATE TABLE `tags` (
      `TagID` int(10) NOT NULL AUTO_INCREMENT,
      `TagDirID` int(10) DEFAULT NULL,
      `FeedID` int(8) DEFAULT NULL,
    
      PRIMARY KEY (`TagID`),
      KEY `FeedID` (`FeedID`),
      KEY `Status` (`Status`),
      KEY `TagDirID` (`TagDirID`)
    )

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,017
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    what exactly is TagDirID? what's it for?
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Zealot
    Join Date
    Jan 2002
    Posts
    111
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It's the foreign key for the actual tag and it's properties in a different table which I call the tag_directory. The table tags is the table that contains all the tags associated with each feed. Each feed can have the same tag applied more than once and each tag can be assigned to multiple feeds.

    I could(and do) do a join on the tag_directory and just search for all feeds tagged with "car" but I thought this way would simplify things a bit.

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,017
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    okay, i think i understand

    by searching for 12345 (the tag id) rather than 'car' (the tag) you've eliminated one table from the query

    but you will still have the problem of searching by one index and sorting by another

    your options are...

    1) leave it as is, and accept "italait" (it takes as long as it takes)

    2) don't sort
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Zealot
    Join Date
    Jan 2002
    Posts
    111
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What do you mean by having the problem of searching by one index and sorting by another?

    Along a different track, are there any settings in my.cnf that I can change regarding memory allocations/buffers that will improve sort times?

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,017
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by brsmu80 View Post
    What do you mean by having the problem of searching by one index and sorting by another?
    your 45-second slow query time is because the search results (by tag_id) need to be saved in a temporary file and then this temporary file is sorted into a different sequence (score)

    if you remove the ORDER BY, you see the impressive speed improvement caused by not having to create the temporary file

    consider now a query which has something like WHERE datecreated >= '2010-01-01' along with ORDER BY datecreated

    this is an example of using an index for searching, and also for sorting -- this one wouldn't require a temporary file
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •