SitePoint Sponsor

User Tag List

Results 1 to 10 of 10
  1. #1
    SitePoint Member
    Join Date
    Mar 2011
    Posts
    2
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Problem with query taking too long

    I have an authentication system for my programs which logs every authentication attempt.
    Now over the months, the logs table have gotten 4,064,450 records.
    The problem is when im executing that query to get the latest 13 records it takes about 10 seconds to do it.

    "SELECT * FROM logs_table ORDER BY `time` DESC LIMIT 0, 13"

    `time` = the time of the attempt.

    Is there a way to optimize the query ?

    Thank you.

  2. #2
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    16,445
    Mentioned
    160 Post(s)
    Tagged
    1 Thread(s)
    I'll say that going through 4+ million records to only use 13 is very inefficient!

    Do you need to retain them all? If not I would periodically DELETE them, maybe after saving a backup just in case you want them later.

    Otherwise I would move the older ones to some sort of "archive" table that doesn't need to be searched so often.

    IMHO it isn't the query that needs optimizing but rather the database architecture.

  3. #3
    SitePoint Member
    Join Date
    Mar 2011
    Posts
    2
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thats the problem, we do search for the logs to get information about the user if needed for alot of cases.

    Could you please give me more info about the "optimizing the database arch" ?

    Thanks.

  4. #4
    Non-Member
    Join Date
    Apr 2004
    Location
    Miami, FL, USA
    Posts
    449
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Something like this should speed up your query significantly (with tweaking, should run about 100x faster):

    Code:
    SELECT * 
    FROM (
        SELECT * 
        FROM logs_table
        WHERE `time` > NOW() - INTERVAL 3 DAY
    ) AS `logs`
    ORDER BY `time` DESC
    LIMIT 0, 13
    How many entries per day? If 3 days returns too many logs, reduce it... or adjust higher if you want more padding.

    Also, if you don't have an index on the "time" column, try:

    Code:
    ALTER TABLE `logs_table` ADD INDEX(`time`);
    Cheers!

  5. #5
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    16,445
    Mentioned
    160 Post(s)
    Tagged
    1 Thread(s)
    If you need to keep and use the data that does limit your options.

    Do you really need every field returned - the "*" ? Even if you do, AFAIK explicitly listing the fields allows the database to use indexes and should run faster.

    For the frequent query, maybe you could make a "recent" table that points to indexes in the "logs" table. Then save the heavy grunt work for the occassion.

  6. #6
    Non-Member
    Join Date
    Apr 2004
    Location
    Miami, FL, USA
    Posts
    449
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Mittineague View Post
    AFAIK explicitly listing the fields allows the database to use indexes and should run faster.
    That's not the culprit here... he's combining an ORDER BY and a LIMIT... that will necessarily compare the `time` field on every result... even when indexed, it's doing a multi-million-record hash prior to the LIMIT.

    To increase performance, he has to to a "pre-sort pseudo-limit" by date. This will allow him to cut out a significant % of his results on an indexed column prior to the ORDER BY/LIMIT.

    It'll significantly increase his performance (assuming the `time` column is indexed).

  7. #7
    From space with love silver trophy
    SpacePhoenix's Avatar
    Join Date
    May 2007
    Location
    Poole, UK
    Posts
    4,904
    Mentioned
    93 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Cool7 View Post
    I have an authentication system for my programs which logs every authentication attempt.
    Now over the months, the logs table have gotten 4,064,450 records.
    The problem is when im executing that query to get the latest 13 records it takes about 10 seconds to do it.

    "SELECT * FROM logs_table ORDER BY `time` DESC LIMIT 0, 13"

    `time` = the time of the attempt.

    Is there a way to optimize the query ?

    Thank you.
    What server-side language are you using? Try dropping the LIMIT CLAUSE and instead in the server-side language run the loop that you use to get each row of the result set 13 times (or for however many rows are required) then break from the loop.
    Community Team Advisor
    Forum Guidelines: Posting FAQ Signatures FAQ Self Promotion FAQ
    Help the Mods: What's Fluff? Report Fluff/Spam to a Moderator

  8. #8
    Non-Member
    Join Date
    Apr 2004
    Location
    Miami, FL, USA
    Posts
    449
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by SpacePhoenix View Post
    What server-side language are you using? Try dropping the LIMIT CLAUSE and instead in the server-side language run the loop that you use to get each row of the result set 13 times (or for however many rows are required) then break from the loop.
    Yikes!!! That will retrieve 4 MILLION records!!! I don't even KNOW how a server will handle that... even with an iterative model, there has to be some point in the DB connection that will process all 4 million records!

  9. #9
    Non-Member
    Join Date
    Apr 2004
    Location
    Miami, FL, USA
    Posts
    449
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok, I decided to run a test on this, because I'm interested to see the results...

    I've built the following table:

    Code MySQL:
    CREATE TABLE `logs_table` (
        `id` INT NOT NULL AUTO_INCREMENT,
        `user_id` INT NOT NULL,
        `action_id` INT NOT NULL,
        `time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
        PRIMARY KEY (`id`),
        INDEX (`user_id`),
        INDEX (`action_id`),
        INDEX (`time`)
    );

    I'm currently populating it with data... 4,000,000 records spread out randomly over a 12 month period. I've got 1.5 million records inserted so far!!!

    I'm going to test some different queries on it and see what kind of results I get!

    PS - This is on a 512 MB Slicehost VPS - not the best machine in the world, so should give some good results!

  10. #10
    Non-Member
    Join Date
    Apr 2004
    Location
    Miami, FL, USA
    Posts
    449
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok, I actually inserted 5,000,000 records.... and here are my results:

    Code:
    SELECT * FROM logs_table ORDER BY `time` DESC LIMIT 0, 13
    0.04 seconds!!!

    With no index on `time`, it's 9.8 seconds (makes me think `time` is not indexed!)

    Code:
    SELECT * 
    FROM (
        SELECT * 
        FROM logs_table
        WHERE `time` > NOW() - INTERVAL 3 DAY
    ) AS `logs`
    ORDER BY `time` DESC
    LIMIT 0, 13
    4 seconds!!! The inner query is returning 100,000+ records, and that's killing the index... So I reduced that timeframe and got:

    Code:
    SELECT * FROM (
        SELECT * 
        FROM logs_table 
        WHERE `time` > NOW() - INTERVAL 10 MINUTE
    ) AS `logs` 
    ORDER BY `time` DESC 
    LIMIT 0, 13;
    0.01 seconds

    Still, it seems your problem is no index on the `time` column... go back to my first post and run the ALTER TABLE query.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •