SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)

    Can't explain the weirdness of SELECT COUNT(*) speed

    I've been playing with efficiency of counting rows in an InnoDb table and my personal benchmarks are somewhat weird. Here is the table I used for testing:
    Code:
    CREATE TABLE test (
      id mediumint(8) unsigned NOT NULL auto_increment,
      field varchar(200) NOT NULL,
      field2 varchar(255) NOT NULL,
      PRIMARY KEY  (id),
    ) ENGINE=InnoDB  DEFAULT CHARSET=utf8;
    I filled the table with 162434 rows of random sha1 strings in both varchar columns. Then I ran
    Code:
    select count(*) from test
    It takes 410ms. So far so good, I've read that count(*) on innodb tables is not fast.

    Then I use a trick I've read about somewhere: I add one more column TINYINT UNSIGNED named 'one' which has value '1' in all rows and index it. Then summing these values is supposed to be faster than count(*). My table looks like this:
    Code:
    CREATE TABLE test (
      id mediumint(8) unsigned NOT NULL auto_increment,
      field varchar(200) NOT NULL,
      field2 varchar(255) NOT NULL,
      `one` smallint(3) unsigned NOT NULL default '1',
      PRIMARY KEY  (id),
      KEY `one` (`one`)
    ) ENGINE=InnoDB  DEFAULT CHARSET=utf8;
    And I run:
    Code:
    select sum(one) from test
    time: 220ms. Great! Almost half the time. Good trick. But then I run my original count(*) again to just verify my previous result:
    Code:
    select count(*) from test
    time: 80ms. (!) That's an even better trick I discovered but I'm wondering why? Why just adding a tinyint indexed column to the table improved count(*) speed so much? An interesting thing is that the speed improvement is only if this column is tinyint or smallint - when I changed it to mediumint or int the time was the same as my original test - 410ms. Can someone explain what's causing such differences in this case? I would be willing to use such methods to speed up queries but it looks weird and I don't know if I can rely on this. I can guess that an index may help prevent a table scan but:

    1. there was already primary key index so why my first query didn't use it? Why did mysql use the index on column 'one' and not the PK?
    2. What's causing the difference between tinyint & smallint vs mediumint & int?

    Mysql version 5.0.45 on win2000.

  2. #2
    SitePoint Zealot
    Join Date
    Oct 2007
    Posts
    148
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    most likely the size of the int used to count..

  3. #3
    An average geek earl-grey's Avatar
    Join Date
    Mar 2005
    Location
    Ukraine
    Posts
    1,403
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Lemon Juice View Post
    1. there was already primary key index so why my first query didn't use it? Why did mysql use the index on column 'one' and not the PK?
    An arbitrary index is used instead of a table scan, as index file is usually smaller than the data file. And when there are two indexes which can be used, the smaller is usually also the faster to go through.

    Quote Originally Posted by Lemon Juice View Post
    2. What's causing the difference between tinyint & smallint vs mediumint & int?
    The smaller index tree is, the faster MySQL can go through it.

  4. #4
    reads the ********* Crier silver trophybronze trophy longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    one thing you are just going to have to live with is that innodb is a much more complex beast than myisam and therefore it is much more difficult to model its performance.

    in general, count(*) on an innodb table is slow if: 1) the table has an auto_increment primary key and now rows have been inserted since the server was started because inno doesn't store the next auto_increment value on disk and needs to compute it once upon first use, 2) if there are many transactions open on the table because inno has to reconcile all of the transactions to find the "apparent" row count, 3) plus a couple of other unlikely edge cases.

    but any one of those myriad cases can have their performance significantly change based on the current state of inno's page cache.

    the only way to get a good sense of the true effect of a behavior on an innodb database is to simulate actual usage and use average numbers, not point-in-time numbers.
    Check out our new Industry News forum!
    Keep up-to-date with the latest SP news in the Community Crier

    I edit the SitePoint Podcast


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •