Indexes and date field on massive table

So I’ll be creating a clustered index on a massive table (cust_id, action_date). My primary goal will be to keep a query such as this running at its best:

concat(month(action_date), year(action_date) = 112

Should I maybe think about creating a column with MMYY in it and index that?

I think MMYY column mayl help for search, but clustered index should be unique and it is not looks like MMYY is unique…
you can set database partition by MMYY or action_date and that help significantly

The query above will be very slow because using any kind of functions will prevent the index from being used. Better search for date ranges (generated by the client code) like this:

action_date BETWEEN('2012-01-01', '2012-01-31')

Also, the index (cust_id, action_date) will not work here because action_date must be the first column of the index to work if you search by it. So you may change it to (action_date, cust_id).

Sorry if the example was misleading, ill ALWAYS be doing a join on cust_id and running aggregate functions per cust_id, I will want the order to be cust_id, then action_date

no, no, no :slight_smile:

declare an index on action_date

then return a month’s worth of data with an open upper end date range

WHERE action_date >= '2012-01-01'
  AND action_date  < '2012-02-01'

this is best practice, because it works on both date and datetime columns

btw, lemon juice, BETWEEN is not a function

Sorry, my typo, I often make this BETWEEN mistake when I write my own SQL! :slight_smile:

Is comparing date to datetime documented somewhere, how it works? I used to do this a few years ago and one day MySQL version changed and comparing dates to datetime also changed so I decided it’s best not to rely on it and in such cases I always do this:

WHERE action_date BETWEEN '2012-01-01 00:00:00' AND '2012-01-31 23:59:59'

“comparing dates to datetimes also changed” – i would like to see proof of this

as far as i know, and i have tested it often, what i posted works perfectly for both dates and datetimes

also, your approach using BETWEEN has two inconvenient drawbacks – first, you will always have to do a leap year calculation (for the last day of the month), and second, you might miss some rows between 23:59:59 and midnight (assuming mysql supports subseconds)


This was happening very long time ago so I don’t think I can prove anything - the difference might have been between 5.0 and 4.1 or even 4.0, one was on a Unix hosting the other on Windows. Things have changed so for any practical purposes I think what you say is true nowadays.

They are not so inconvenient :slight_smile:

MySQL will allow me to do this:

WHERE action_date BETWEEN '2012-02-01 00:00:00' AND '2012-02-31 23:59:59'

MySQL 5.6.4 seems to support microseconds - if I have the very unlikely need for them I might do this:

WHERE action_date BETWEEN '2012-01-01 00:00:00' AND '2012-01-31 23:59:59.999999'

But I’m giving you credit for better elegance of your way of comparing :slight_smile:


just because mysql allows it, doesn’t mean it’s something you should do

seriously? you would actually do that? the 31st of february?

please don’t say yes, because… well, just because

Well, you provoked me to it so I found out a working solution, however non-standard :smiley: I’ve never used 31st of Feb. because I’ve never had the problem with calculating leap years :slight_smile: