SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Zealot number7's Avatar
    Join Date
    Jul 2004
    Location
    Virtual City, Neural Networks str. 7
    Posts
    139
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Exclamation How to speed UP MySQL database with 2000000 records

    Hi buddies,

    I have a task to create a database for website visitors statistics. I tested on my own 2.2Ghz, 512 RAM server MySQL database with 2 million (2 000 000) records. It takes about 1 second for select on Indexed column.
    Code:
    SELECT COUNT(*) FROM STATS;
    
    result:
    Query Ok. 2000000 records (0.87 sec.)
    But there may be easy more than 2 million records on such database and it will take more and more time for selects. I have an idea to use archive tables.

    Have you any more ideas how could I speed up selects ?
    Have you any Ideas how could I decrease record count ?

    Note: One record = web page visit (known as "hit").

    BIG Thank's

  2. #2
    SitePoint Wizard wdmny's Avatar
    Join Date
    Jul 2000
    Location
    Here
    Posts
    1,010
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You should be able to speed up that query by changing it to: COUNT(id) rather than COUNT(*). Replace all your "SELECT *" queries with a more explicit call like "SELECT id, title, postdate".

    Also, I would create a table to store past totals, so that your queries are only handling a smaller, recent batch of records. For each week/month, add that period's totals to the table. In your current query, use a date range to only pull fresh records. Next, do a SUM of your totals table and add that to your fresh data.

    Off Topic:

    1000th Post!

  3. #3
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,220
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by Wes DeMoney
    You should be able to speed up that query by changing it to: COUNT(id) rather than COUNT(*).
    are you sure?

    count(colname) counts only non-nulls

    count(*) counts rows

    of course, if id is a primary key, it has to be not null, so count(id) will generate the same answer, but i don't see it being faster

    wish i had a 2-million row to test this on, but i don't

    and hey, congrats on the thou
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  4. #4
    SitePoint Wizard wdmny's Avatar
    Join Date
    Jul 2000
    Location
    Here
    Posts
    1,010
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937
    are you sure?

    count(colname) counts only non-nulls

    count(*) counts rows

    of course, if id is a primary key, it has to be not null, so count(id) will generate the same answer, but i don't see it being faster

    wish i had a 2-million row to test this on, but i don't

    and hey, congrats on the thou
    Well, I have always read that Count(colname) was better and faster, and that it was generally bad to use the astericks SELECT. Also, like you mentioned, the ID column would not contain nulls, so that would not be an issue.

    Maybe number7 can run a comparison of SELECT(*) vs SELECT(id) and tell us which is quicker.

  5. #5
    SitePoint Zealot number7's Avatar
    Join Date
    Jul 2004
    Location
    Virtual City, Neural Networks str. 7
    Posts
    139
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sure, and here are the results:

    Code:
    SELECT COUNT(*) FROM `st_hits`;
    2050000 records, 0.00 sec.
    
    SELECT COUNT(ID) FROM `st_hits`;
    2050000 records, 0.00 sec.
    ID is indexed, and with no indexed column:

    Code:
    SELECT COUNT(*) FROM `st_hits`;
    2050000 records, 0.00 sec.
    
    SELECT COUNT(SHREF) FROM `st_hits`;
    2050000 records, 1.78 sec.
    Seems to be practically no difference ?

  6. #6
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,220
    Mentioned
    58 Post(s)
    Tagged
    3 Thread(s)
    well, i think what you're seeing is the fact that the count of the entries in an index is retrieved with one call to the index rather than actually counting anything

    count(*) can also get its value from table stats, i think

    whereas count(colname), if colname has no index, has to count the non-nulls (1.78 seconds is actually pretty slow as far as a mysql retrieval is concerned)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  7. #7
    SitePoint Enthusiast
    Join Date
    Feb 2004
    Location
    France
    Posts
    58
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The speed of COUNT(*) depends of the table type. MyISAM stores the row count in the table itself, so retrieving it should always be instantaneous. Innodb has to check the table (which can be slow or fast if you have some index).

    This being said COUNT(*) is not a very good way if finding out the speed of a database - most queries do something a bit more useful 2M rows shouldn't be a problem with MySQL anyway.

  8. #8
    SitePoint Zealot number7's Avatar
    Join Date
    Jul 2004
    Location
    Virtual City, Neural Networks str. 7
    Posts
    139
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you buddies for your help. Anyway I'm starting design of my database and when I't will be complete I suppose there will be provided some interesting test results.

    Feel free to post your topics if you have any more ideas about MySQL DB queries speed up for large scale statistical data.

    Thank's


    number7


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •