SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Wizard wheeler's Avatar
    Join Date
    Mar 2006
    Location
    Gold Coast, Australia
    Posts
    1,369
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    DATETIME indexing woes

    I am having an ever-worsening problem with 2 DATETIME columns (`starts`, `ends`) which belong to an auction database table. Both columns have an INDEX on them, because they are almost always the primary criteria for SELECT queries.

    The problem is that when the index is updated (which is very often), all subsequent queries are locked until the index update is complete. Under high load situations, this creates a massive backlog for MySQL and often takes several minutes to regain control. The bigger the auction table gets, the worse it becomes.

    Some typical queries are:
    - select where starts < NOW() and ends > NOW() (occurs all the time)
    - update set starts = 'new start time', ends = 'new end time' where ends < NOW and sold = '0'

    Regarding the second query, I have to run this row by row as the starts and end times are variable depending on the row duration value.

    Perhaps part of the problem is that almost every start / end time is unique, and there are currently 90,000+ unique values.

    I am not sure what to do - I assume that the MySQL index is the most efficient sorting method, therefore trying to replicate the index numerically would be a waste of time. Perhaps I could extract the two columns to a second table, however again I don't see how that would resolve the situation. I also have replication occurring and I considered running all SELECT on the slave, however even under this situation there would be some potential slowness due to the backup schedule.

    What can I do with these columns to create better efficiency?
    Studiotime - Time Management for Web Developers
    to-do's, messages, invoicing, reporting - 30 day free trial!
    Thomas Multimedia Web Development

  2. #2
    SitePoint Wizard wheeler's Avatar
    Join Date
    Mar 2006
    Location
    Gold Coast, Australia
    Posts
    1,369
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    would MySQL perform better under a single multi affected rows SQL statement where the index is updated, or a row by row?
    Studiotime - Time Management for Web Developers
    to-do's, messages, invoicing, reporting - 30 day free trial!
    Thomas Multimedia Web Development

  3. #3
    SitePoint Wizard wheeler's Avatar
    Join Date
    Mar 2006
    Location
    Gold Coast, Australia
    Posts
    1,369
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I can answer my own question - yes, it is slightly faster running a multi row statement:

    UPDATE auctions SET starts = '$now', ends = DATE_ADD(ends, INTERVAL `duration` DAY) WHERE ends < '$now' AND closed = '0'

    but to my surprise it is barely faster. I thought the above would result in only 1 index update, whereas the php loop row-by-row would result in an update for every affected row which would be slow. I guess either way you look at it the index is going to be changed by the same amount.

    The biggest impact was obviously in removing the index (about 3x faster UPDATE), however I need the index for the SELECT statements which perform about 20% better with the index.

    It seems no-one is game to tackle this question?
    Studiotime - Time Management for Web Developers
    to-do's, messages, invoicing, reporting - 30 day free trial!
    Thomas Multimedia Web Development

  4. #4
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,015
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by wheeler View Post
    It seems no-one is game to tackle this question?
    actually, there probably are several people who could tackle it, if they understood it

    you must realize that statements you make have a direct bearing on people's willingness to engage, to spend time on your issue

    for example...
    Regarding the second query, I have to run this row by row as the starts and end times are variable depending on the row duration value.
    this certainly seems to put the kibosh on alternative update strategies, wouldn't you say?
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  5. #5
    SitePoint Wizard wheeler's Avatar
    Join Date
    Mar 2006
    Location
    Gold Coast, Australia
    Posts
    1,369
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    this certainly seems to put the kibosh on alternative update strategies, wouldn't you say?
    at first I thought so, but then I came up with the second query which affects all the rows at once.

    Sorry if I am not making sense, I get like that sometimes when I look at a problem for too long.

    Here is an alternate explanation from the beginning.

    ----------------------------

    I have a table of about 120,000 rows that contain auctions data.

    ------

    CREATE TABLE `auctions` (
    `id` int(11) NOT NULL auto_increment,
    `seller` int(11) NOT NULL default '0',
    `category` int(11) NOT NULL default '0',
    `title` varchar(75) NOT NULL default '',
    `description` text NOT NULL,
    `duration` int(3) NOT NULL default '0',
    `closed` enum('0','1') NOT NULL default '0',
    `starts` datetime NOT NULL default '0000-00-00 00:00:00',
    `ends` datetime NOT NULL default '0000-00-00 00:00:00',
    PRIMARY KEY (`id`),
    KEY `seller` (`seller`),
    KEY `starts` (`starts`),
    KEY `ends` (`ends`),
    KEY `category` (`category`),
    FULLTEXT KEY `title` (`title`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

    ------

    I believe all the keys are necessary, as the table is constantly selected based on the start / ends fields, and also by seller id and category id.

    There is naturally a large variation in the `ends` values as auctions have finished at various times 24/7 over a couple of years. Unless sold, generally the item will relist.

    This basically is set with the following ($now is a PHP variable):

    UPDATE auctions SET starts = '$now', ends = DATE_ADD(ends, INTERVAL `duration` DAY) WHERE ends < '$now' AND closed = '0'

    I have tried updating this within a PHP loop row by row, as well as the multi row query as above. To my surprise, the multi row statement only provides a marginal performance improvement.

    On my test machine, updating the entire table takes 70-80 seconds, which for exaggerated testing purposes affects about 49,000 rows.

    Now here is the problem - when this is running, all other queries become LOCKED, which I believe is the result of the index being updated. Of course, in a production environment this produces unacceptable lag.

    I thought that if I ran the multi-row statement, the index would update only once and very quickly, whereas with the row by row method it would have to update on every row. But another way to look at it is that the index changes by the same amount regardless of the method, so perhaps its not surprising that it takes roughly the same amount of time.

    So it seems like a rock and a hard place - my SELECT is too slow without the starts/ends index (probably 20% slower), and UPDATES are too slow WITH the index (about 3-4 times slower).

    What am I doing wrong here?
    Studiotime - Time Management for Web Developers
    to-do's, messages, invoicing, reporting - 30 day free trial!
    Thomas Multimedia Web Development


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •