SitePoint Sponsor

User Tag List

Results 1 to 4 of 4

Hybrid View

  1. #1
    SitePoint Member
    Join Date
    Jan 2005
    Location
    Phoenix AZ
    Posts
    5
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question Query taking forever!

    I run a connection testing site, and when my users test various aspects of the test are logged. This is a very new feature I have added to my site, and I am not an expert in SQL, but am getting much better every day. My new DB has already grown to 1.5 million entries but with the indexes I have most routine queries are running well under 0.3 sec, some more complex or multiple queries in the same script are running at about 0.5-1.1 sec -- all thanks to our buddy INDEXING - BUT the following query is still taking too much time.

    This query is taking 5-6 seconds to complete, I have a very fast dedicated server (dual XEON 2.8GHz) - User_Host is indexed.


    SELECT `User_Host`, round(avg(`Test_Score`)) AS Score, COUNT(`User_Host`) AS Count FROM `Master` WHERE `Date` >= '$date_0' AND `Test_Type` = 'download' GROUP BY `User_Host` HAVING Count >= 1000 ORDER BY Score DESC LIMIT 10
    End result is a list of ISP's ordered by speed (based upon tests performed at my site)

    example (crudely pasted):
    Top Ten Providers (ordered by Download Speed)
    Rank Provider Average Download Speed ~ 30 Day Test Count
    1 optonline.net 4505 Kbps or 550 KB/s 1699
    2 cox.net 3253 Kbps or 397 KB/s 10480
    3 insightBB.com 3108 Kbps or 379 KB/s 1666
    4 mindspring.com 3034 Kbps or 370 KB/s 1760
    5 attbi.com 2667 Kbps or 326 KB/s 3703
    6 comcast.net 2621 Kbps or 320 KB/s 17880
    7 adelphia.net 2550 Kbps or 311 KB/s 10208
    8 rr.com 2477 Kbps or 302 KB/s 8633
    9 rogers.com 2290 Kbps or 280 KB/s 1885
    10 charter.com 2265 Kbps or 276 KB/s 2810

    Where am I going wrong in my syntax? I have read and read mySQL documentation and am now pulling out my hair.


    ALSO, what the hell am I going to do about the size of my database... row size is only 150 bytes but with 1.5 million entries and growing at many hundreds of thousands per month isn't it going to get out of hand... and become slow, is just flushing old data the best thing to do in my case??

    Sorry about the long post, I just have so many unanswered questions.

  2. #2
    SitePoint Zealot johno's Avatar
    Join Date
    Sep 2003
    Location
    Bratislava, Slovakia
    Posts
    184
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I don't know what else are you doint with this data but for this problem consider using this table structure. (Ok, I know that data normalization is a good thing but this time it a performance issue)

    host_id, score_sum, count, date

    Now, the first column is selfexplanatory. The second is the sum of scores for a given host_id and date. column "count" is the number of scores of the given host and date. Date is a also selfexplanatory.

    Ok, so if you have this structure. When you are inserting a new row in your old system you just have to make this:

    Look if you already have some row for current day and host_id. If not, create it and then update the data with
    UPDATE table SET score_sum = score_sum + $new_score, count = count + 1
    If some row already exists update it only.

    Ok so you now have rows for every specific host and day in this format. To retreive the data you need do just this select
    SELECT host_id, sum(score_sum)/sum(count) as average_score, sum(count) FROM table WHERE date >= '$date' GROUP BY host_id ORDER BY ...
    Annotations support for PHP5
    TC/OPT™ Group Leader

  3. #3
    SitePoint Addict myrdhrin's Avatar
    Join Date
    Jul 2004
    Location
    Montreal
    Posts
    211
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Add an index on 'test_type', 'date' in that order.... and since not all databaes engine can optimize the SQL query... make your where clause like this:
    Code:
    `Test_Type` = 'download' AND `Date` >= '$date_0'
    The reason is: your most specific criteria is hot the host but the test_type... the host is only there for grouping so having an index will group faster but not reduce the number of rows you have to scan.

    Now, if that still does not do the trick... create a cache (don't get rid of that raw data... especially since I assume you got lot more than just the the size and host).

    What is mean is this....

    - create a new table that has for structure the result you want (user_host, score, count) put 1 index on the score column (after all your sorting DESC on that column)

    - using CRON (or any program that can run in the background and at scheduled time) build a temporary table that will contain the result of the SQL you posted above. and rebuild that cache every hour of more frequently (every 10 mins... 5 mins... depends on how important this data is). It could still take around 5secs (or less) but at least the visual impact and response time will be very quick. Make sure you inform the end user the data is not the most recent and that it will be refreshed within the next X mins.
    Jean-Marc (aka Myrdhrin)
    M2i3 - blog - Protect your privacy with Zliki

  4. #4
    SitePoint Member
    Join Date
    Jan 2005
    Location
    Phoenix AZ
    Posts
    5
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Okay, first of all here is my structure for this table

    Test_ID ------- varchar(9)
    Test_Score --- smallint(5)
    Test_Size ---- varchar(10)
    COMP_ID ------ varchar(16)
    User_Name ---- varchar(20)
    Affiliate ---- varchar(15)
    Test_Type ---- enum('Download', 'Upload')
    Date --------- timestamp(8)
    User_Host ---- varchar(20)

    here is a row in order

    Test_ID -- Test_Score -- Test_Size -- COMP_ID -- User_Name -- Affiliate -- Test_Type -- Date -- User_Host

    and the actual row with sample data
    9AUA5RUJO -- 801 -- 2952-KB -- 62540784511 -- username -- Upload -- 20050101 -- optonline.net

    I wasn't thinking about indexing the Test_Type because there are only two choices (upload/download) but now that I think about it... the cardinality would only be two and that would cut my search almost in half. Correct me if I'm wrong.

    But I would think that since User_Host is indexed this would narrow down the search, but the cardinality of user host is 27,000 and I guess it has to average all of those and that is what is taking time... can you go more indepth about setting this as a task with cron (how the hell do I make a temp table with this data) -- it really only needs to be updated every 24 hours or so.

    Thank you.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •