Alternatives to auto_increment
[Warning: I'm long-winded. Actual question is in bold, a tad below.]
Thanks to past rants found in a miriad of threads at Sitepointforums, I came to develop a new way of looking at 'auto_increment' column attributes to generate unique keys. I'd now like to try generating my own primary keys.
Aside from the many reasons found in threads like this one, I need to make sure that the application I currently develop will be portable and compatible with a handful of RDBMS. 'auto_increment' is a MySQL implementation, and while it's quite a good feature, I'm a control freak. I want the database system to obey me. I therefore want unique ID that mean something to me. I also want to make sure that the DB and my app will work under the pressure of a insert-intensive environment. While I'm not claiming that my site will get millions of visitors a day and each of them writing 100 messages an hour, I prefer coding for strenght and with an open mind, since when my app matures I see a good future for it (and me... and my Swiss banker... :) )
Having looked around a bit, I have experimented using a monotonic key generation by basing myself on the way PHPLIB does with it's 'nextid' function. For those unfamiliar with this method, a common table named 'db_sequence' is created with 2 columns (seq_name, nextid). For each table that an autonumber is needed, a row is added to 'db_sequence' with the table name (seq_name) and a blank start-up id (nextid). Now, every time a record is added to a table that requires an autonumber, we first fetch the nextid for that seq_name in the db_sequence table, then update db_sequence with nextid+1, and finally use the nextid for whatever row insertion we wish to do elsewhere. Nice, sweet, elegant and intuitive... but requires table locking.
But to tell the truth, I've been seduced by this Sybase paper on Surrogate Primary Key Generation, mostly aroused (...intellectually) by the Recycled Series Approach. In short, and from what I understand, the Recycled Series Approach allows for better concurrent use of the db_sequence table when under stress and would keep the keys small. As the doc says :
If a process dies, another picks up the slack instead of opening a new series. And no need to do table-locking since each process is responsible for it's own turf and will not simultaneously do two jobs at once (no collision).
In the recycled series approach, each concurrent process has its own conceptual next-key table.
For example, three concurrent processes can acquire next key values as follows: One process can work on the sequential series from 1 to 1,000,000, the second can work on the series from 1,000,001 to 2,000,000, and the third from 2,000,001 to 3,000,000.
...and then there's the Composite Key Approach, which basically means to tag the server's user_id to the precise date and to some other random number. The PHP function uniqid does exactly that I believe. This way we can do away with the extra queries associated with using and maintaining a sequence table. The downside seems to be larger keys, and a slight possibility of creating identical keys when on a multi-processor machine... or when someone readjusts the computer clock. :eek:
Now I'm perplexed as I am not sure which way to go. :confused:
Question:I'm looking for a rock-solid way to generate unique ids to use in tables that require primary keys. Any thoughts?
Requirements for proper key generation would be the following:
Has anyone experimented with such systems and would like to comment on their experience? Does anyone know how do banks and telcos approach this problem? Should I simply give up and keep on usong auto_increment? :)
- Must withstand and perform well under low-to-very-high levels of stress (in load balanced environment or whatever);
- Should always create unique keys;
- Keep overhead and impact low;
- Set and forget, because once in a production environment it would be nightmarish to replace all generated ids to a new system.