SitePoint Sponsor

User Tag List

Results 1 to 11 of 11
  1. #1
    SitePoint Guru
    Join Date
    Mar 2002
    Posts
    608
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Looping through millions of records taking forever

    Hello,

    I have a page that builds rows for a table based on mysql data. However, there are millions of records in the table I am querying and I notice that it takes 20 or so seconds to build 7 table rows.

    Summary:

    Each table row queries millions of website visits to determine that days total for web traffic (I use a basic select count for overall day traffic AND select count(distinct field) for unique.

    This happens 7 times in the loop

    Should I create a function instead that builds the rows/queries and just put the function in the loop? I am using LInux/Mysql , not shared, and I confirmed the table structure earlier is fine for querying.

    Any suggestions for performance based on your experience? I can show code, but my question is mostly general as what I learn would also possibly help later on

    Thank you

  2. #2
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Performance this poor means that either you're performing multiple queries where you only need one, your queries are poorly written, or you're not making appropriate use of indexes on your tables.

    You can select all of your websites, the day's total traffic, and the day's unique traffic counts all in a single query.

    If you're already using a single query to do this, put the word EXPLAIN in front of the query and run it (through the mysql command-line or your favorite interface). MySQL will tell you what it's doing to build your result set, so you can spot potential keys for indexing or portions of the query that should be rewritten.

  3. #3
    SitePoint Guru
    Join Date
    Mar 2002
    Posts
    608
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi Dan,

    The reason I don't use one query is because I am looping through 7 days of traffic. 7 queries for each specific day (timestamped)

    I'm having a mental block as to how to avoid this. I keep thinking I had to do a separate query per day, where I loop 7 times for 7 days

    Each loop is something like this:
    select count(ip) from traffic table where day = '$timestamp'


    That's my dilema. Getting those results in a loop to show with *one* query so I can build 7 rows in a table for all 7 days (as in, a summary of the last 7 days)

  4. #4
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Code:
    SELECT
        ID,
        website,
        (SELECT COUNT(IP) FROM traffic WHERE TO_DAYS(NOW()) - TO_DAYS(day) < 7) AS hitCount,
        (SELECT COUNT(DISTINCT(IP)) FROM traffic WHERE TO_DAYS(NOW()) - TO_DAYS(day) < 7) AS uniqueCount
    FROM
        traffic
    Requires MySQL 4.1 or higher. That would give you all your websites along with the counts you want in a single query, and it'll be much faster than performing 3 separate queries.

    Off Topic:

    You should probably use DATE_SUB instead of TO_DAYS but I'm not comfortable enough with that function to write an untested example.

  5. #5
    SitePoint Guru
    Join Date
    Mar 2002
    Posts
    608
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Interesting. I will test this. I didn't realize I could loop through 7 rows with a unique date like this. This opened my eyes to TO_DAYS

    Thanks. I'll let you know if it burps ,but I trust it should work.

  6. #6
    SitePoint Guru
    Join Date
    Mar 2002
    Posts
    608
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Seem to always get query errors. Do I not use a timestamp or single quote inside the TO_DATE?

    Error:
    #1064 - You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT COUNT(IP) FROM alltraffic WHERE TO_DAYS('1146801600') -

    I'll keeep testing, of course...just fyi

  7. #7
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Do you have MySQL 4.1 or newer?

  8. #8
    SitePoint Guru
    Join Date
    Mar 2002
    Posts
    608
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Damnit, I thought I did. I was sure I did, but it's 4.0+. I could use the concept of subtracting dates, just can't use that function I guess. Sorry...

    I'm going to use my vars for weekago/today and see if I can get that to work

  9. #9
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    It's not the TO_DAYS() you can't use, it's all subqueries. You can probably rewrite that query using only JOINs, but you're missing out on a lot by not upgrading.

    4.1 went into production release October, 2004. Current version is 5.1.

    As a side note... you still could've eliminated 6 queries per loop by computing two timestamps representing a 7 day period and selecting rows where the date is between those values.

  10. #10
    SitePoint Guru
    Join Date
    Mar 2002
    Posts
    608
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Six queries per loop? I want to make sure I explained that each loop only queries
    disctinct and non disctinct. 7 looped rows, 2 queries per row (distinct/non distinct where date = that day). Problem was I did between for every DAY I loop through. Yeah...

    regardless, I'll look to upgrade...

    Thanks

  11. #11
    SitePoint Guru
    Join Date
    Mar 2002
    Posts
    608
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    We upgraded to 4.1+
    I put in the query and I get
    "Client ran out of memory"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •