SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Faster Count on large table?

    Greetings,

    I have a table with around 300,000 rows. I am trying to count the number of items listed in a primary category (cat1level1) or second category (cat2level1), for example:

    PHP Code:
    SELECT COUNT(*) as num FROM items USE INDEX (cat1level1,cat2level1WHERE (cat1level1 '3' OR cat2level1 '3'
    1st and 2nd categories are indexed columns. Category "3" is very popular and has around 200,000 items within it (two thirds of all the items in the table). The count query itself takes around 2-4 seconds to complete when counting items from category "3". Whereas in less-popular categories with only 1000 items, it only takes a more reasonable 0.03 seconds to count.

    Does anyone know why this is happening and is it normal? Is there a solution to speed up this count in case this popular category grows to contain a million items?

    Thanks
    Kind regards

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,274
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    have you done an EXPLAIN on the query?

    i would think that with 2/3 of all rows in the table, you're always gonna get a table scan for category 3

    also, the 2nd column of that index will never be used by your OR condition
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I've done the explain query and got this:

    id select_type table type possible_keys key key_len ref rows Extra
    1 SIMPLE items index_merge cat1level1,cat2level1 cat1level1,cat2level1 2,2 NULL 2 Using sort_union(cat1level1,cat2level1); Using whe...

    Also, the two columns are two separate indexes, in hopes that both would be used in an "OR" statement.

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,274
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by peppy View Post
    Also, the two columns are two separate indexes, in hopes that both would be used in an "OR" statement.
    no, mysql will use only one index on any table in a query

    it kinda looks like it's using an index, though

    disclaimer: some EXPLAINs are over my head
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What would be the best way of handling this then for the fastest count? Would you recommend having 2 separate queries, and then add the two counts?

    For instance:
    SELECT COUNT(*) as num FROM items USE INDEX (cat1level1) WHERE (cat1level1 = '3')
    SELECT COUNT(*) as num FROM items USE INDEX (cat2level1) WHERE (cat2level1 = '3')

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,274
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    if i'm doing it, i would never tell the optimizer which index to use, except in those circumstances where it decided to use an index that wasn't anywhere near efficient

    i am of the strong opinion that the mysql optimizer engineers are a lot smarter than i am

    yes, i would use two queries here
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    SitePoint Addict
    Join Date
    Jul 2007
    Posts
    233
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I did a little more reading and I think newer versions MySQL are capable of using more than 1 index on a table in a single query.

    Something is off with the optimizer. When I remove the 'USE INDEX' on the query on a category that contains 200,000 items, it does cut the time down a bit. However, if I query a count of the items in a less-used category with only 1000 items, it takes 4 seconds... whereas if I included the 'USE INDEX' in the query, it would only take 0.01 second.

  8. #8
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,274
    Mentioned
    60 Post(s)
    Tagged
    3 Thread(s)
    i'm not a DBA but isn't there something you can do like "update stats" so that the optimizer knows the true counts, and realizes that 1000 items out of 200K requires an index
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •