I agree, often there is a better approach. What about invoice numbering? This has been the only case where I needed to select a max before inserting because invoice numbering must be strictly sequential under all circumstances and any potential gaps in an AI id should not affect the sequence. I simply locked the entire table for the whole select and insert operation - for this scenario this worked fine since usually creating an invoice does not happen multiple times per second so the occasional locking is not problematic.
But I posted a tested and working query, not a theoretical one, so it certainly works in mysql 5.6. I don’t know about the manual, maybe it’s simply wrong? Or maybe aliasing the table in the subquery means it is no longer the same table by mysql’s standards? It doesn’t work without an alias, though.
The scenario you mention is correct, most countries have strict laws regarding invoice number order. But it is a very simple issue to solve. A code sequence like this, would work in most cases as long as it is not wrapped in a long transaction. I.e. You need to do this at the end, after committing the updated order transaction, then do this insert, then after continue with any other transaction.
Your SQL is practically the same as what I suggested above except for a bit different syntax without a subquery and not needing to alias the table. I’m only wondering whether it is guaranteed to be concurrency safe. In practice this will work very fast if invoice_id is indexed so there should be very little chance of a collision. But does the query lock the table when it starts doing the select? If the table is not locked for reading immediately then there is still a chance that two concurrent queries will select the same max invoice_id and try to insert it to the table.
It is not as simple as it may seem. I was curious about how it will work in a concurrent environment and tested the INSERT INTO ... SELECT MAX() ... query you suggested in a loop from two separate threads. And it failed - I was getting deadlock errors.
Then I tried using two separate queries - SELECT MAX() and then INSERT in a single transaction. Obviously, it failed too as I was getting duplicate max values (I used a unique key on the sequence column so I was getting duplicate key errors).
Then I tried adding FOR UPDATE to the SELECT MAX() query and it failed, too! I thought this would lock the table for reads and make the queries safe but it didn’t. I don’t know why, I can’t make out anything useful from reading mysql docs. Maybe FOR UPDATE does not work with aggregate functions? Anyway, not nice.
The only thing that worked was locking the table for writing using LOCK TABLES. In that scenario both methods worked: single and dual query. But when two (php) thread were doing the inserts in a tight loop the table was constantly locked and I was not able to read anything from it from a yet different thread - mysql was not able to find even the smallest gap between the locks to get to the data with a SELECT statement.
As a last experiment I tried the same with Postgres. The INSERT INTO ... SELECT MAX() ... query worked out of the box in two concurrent threads and I didn’t spot any problems. No need to use transactions or lock the table. Moreover, while the two threads with inserts were running the table was not locked and I could browse the data from a third thread without any problem. Well, now I begin to see the advantages of Postgres
UPDATE: My initial findings in Postgres were wrong because I forgot to set up the unique index on the sequence column. It turns out the INSERT INTO ... SELECT MAX() ... query on its own is not concurrency safe in Postgres, either. However, the solution is to use a transaction and lock the table in one of the modes: EXCLUSIVE, SHARE ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE or ACCESS EXCLUSIVE - then there are no conflicts any more. And the table is not locked for reading from other threads so still the advantage over mysql remains. And a syntactic sugar bonus is that Postgres doesn’t need awkward table aliasing when locking tables.
This depends on the isolation level. But, yes you could get two selects returning the same number. Though this is normally only an issue when used wrong.
[quote=“Lemon_Juice, post:13, topic:221775”]
It is not as simple as it may seem. I was curious about how it will work in a concurrent environment and tested the INSERT INTO … SELECT MAX() … query you suggested in a loop from two separate threads.[/quote]
I hope you did not use the query exactly as I posted it? Since MAX will give the highest number currently in use in the table. That was a oversight in the reply above, it should be something like this: (Also note, that this type of query will require a invoice id already in the table for it to work)
When this is wrapped in a single transaction, that is committed right away it is so an say concurrency safe. The MAX part of the statement is not executed before the row is inserted into the database (through the use of a temporary table), when this happen meaning you can have multiple of these queued up without causing an issue when its done right. (This is pretty straight forward, the issue is of course having the code handle this since you need to segment the order setup process.)
The problem comes if this query is part of a longer commit, where other queries depend on the result of this query. If you have this, it is not safe anymore, and prone for deadlocks. The reason is as you mention, several processes could have pulled the same value and this is not noticed before the commit.
A deadlock can happen due to several reasons, if you used just MAX() as indicated above, this is why since each attempt try to insert the same. It could also be if your deadlock setting is too low, and your loops did not commit quick enough, triggering it.
I ran a few tests and both manual running multiple transactions using the code above at the same time worked without any issues, same with running loops inserting 100 records each from four concurrent threads (delayed start so they run at the same time).
If you ran the query above, I would be really interesting to know what MySQL version, what isolation level and the base settings for InnoDB.
I tried setting all possible isolation levels and it didn’t make any difference.
But then we need to be more precise - can you specify what you mean by wrong and what usage is right? My test is so simple and bare-bones that I can’t think of anything that could be wrong so I’m especially curious about how to do it right.
Of course not, I thought this was obvious. I also used COALESCE to make it work when there are no rows in the table. So to make it all clear this is the table:
CREATE TABLE `tt` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`invoice_id` INT(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE INDEX `invoice_id` (`invoice_id`)
and the query that is run in a loop:
INSERT INTO tt (invoice_id)
SELECT COALESCE(MAX(invoice_id)+1, 1) FROM tt;
I run this from two separate php scripts at the same time and it takes a few seconds before one of them terminates with SQLSTATE: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction.
it should be, this is only a single statement so it should create a transaction for itself, I tried wrapping it in a transaction but it didn’t change anything - I assume wrapping a single statement in a transaction makes no difference.
it certainly isn’t because deadlocks happen.
Maybe theoretically this is what should be happening but something must be not right if deadlocks happen. Again, can you clarify what you mean by done right?
MySQL 5.6, now I tried 5.7 and there is no difference. I tried all isolation levels that the manual mentions. As to InnoDB settings - this is a default Windows installation without any tinkering (the developer profile was chosen in the installer), any particular settings that you are interested in?
Just to be clear - when you ran your loop I hope you didn’t wrap the loop in a single transaction? That might actually work but would defeat the purpose of the test because this would not simulate concurrent usage in the real world by different users (database sessions) - obviously, different users cannot share the same transaction. In each loop iteration I insert a single row - just like a real user would normally create a single invoice at a time.
I’m really curious how you can make this query safe for concurrent use
Yes, when I wrote the comment, it was hard to be more specific as I had no idea how you tested this. Look below.
That is correct, as long as auto commit is enabled, and the database layer used does not put any restrictions on you.
Correct, that is why I said “so an say”. Perhaps not the correct English term for saying this, but what I mean by that is that using this method is as close to concurrent safe you can get without screwing up database performance.
The problem with concurrency in MySQL (and other databases) is that it comes at a cost (speed). As you noticed yourself, actively using locking on the table to force concurrency is not always a good idea.
There is solutions to handle the cost of speed, but this come at a price (money) that makes is unavailable for many clients. In addition, in most cases it is possible to get away without true concurrency.
Note. Please understand that some industries like banking, cannot handle it as described below
First, for the isolation level I recommend REPEATABLE READ or SERIALIZABLE, though unless you have a specific need for Serializable (and know you need it), stay with the default Repeatable Read.
For the actual table we use for the invoice ids, it is important it rely on the actual order id as the primary key (foreign key), and the invoice id is set as unique. Do not use a auto increment on the primary key, as this create another possible place for the deadlock to happen, and also require either another redundant table or column in order table to refer to the invoice id.
CREATE TABLE invoice ( order_id INT UNSIGNED NOT NULL, invoice_id INT UNSIGNED NOT NULL,
PRIMARY KEY (order_id),
UNIQUE INDEX invoice_id_UNIQUE (invoice_id ASC),
FOREIGN KEY (order_id)
REFERENCES order (order_id)
ON DELETE RESTRICT
ON UPDATE CASCADE)
ENGINE = InnoDB;
The frequency of how often you create a invoice id and also current transactions in progress is what is important when we consider deadlocks.
This is why the creation of the invoice id has to be separated from the rest of the process of updating the order data. Basically reducing the number of possible transactions/inserts active at the same time.
Of course the amount of invoice ids created during X time period also play a role here, if you need to create more than an invoice every second, I recommend that you look for another solution.
This should be setup as a prepared statement, and the execution part, should be wrapped in a try clause. If it fails (deadlock) try again, for security set it up so it tries 2-3 times though normally it will succeed in the next try.
We have run a similar solution as mentioned here for several clients over the years, the one with most “activity” create around 10k invoices per day, which is a low number and not an issue to handle even during peak times.
As an example, I setup the code you posted and ran it from four different scripts, each creating 10k invoices each (and had the try clause only retry once) within 1.4 minutes all four scripts finished running and I had no deadlocks. Approx. 500 invoice ids created per second. (A side note here, I do not test vs localhost, but a production size db server we use for dev on the local network, so dont expect to get same speed result vs localhost due to IO issues).
[quote=“TheRedDevil, post:16, topic:221775”]
As an example, I setup the code you posted and ran it from four different scripts, each creating 10k invoices each (and had the try clause only retry once) within 1.4 minutes all four scripts finished running and I had no deadlocks.[/quote]
If you used a try clause to retry queries then this is not a valid test because we are not interested (I think) in applying workarounds here, only in checking if the insert query is concurrency safe. As a side note - I prefer to avoid retrying if possible because it adds complexity. It’s easy to retry a single query but if I have a longer piece of code with many queries in a transaction - or in multiple transactions one after another - then in case of an error I have to repeat the whole transaction, which can be a pain depending on how the code is structured. That’s why I prefer to choose execution plan that is guaranteed not to deadlock even in a concurrent environment.
I think it must be easier to come across a deadlock on localhost because of higher query contention due to shorter and faster IO path. But that is not that important here.
I didn’t mean to look for specific methods for creating sequential invoice id’s and I’m sure what you are proposing is a valid approach. There are a number of ways to make sure the creation of the id’s does not break in a concurrent environment and in the case of invoices we usually have a lot of possibilities because invoices aren’t normally created with the speed of hundreds per second.
I only wanted to point out that the query INSERT ... SELECT MAX()... proposed in this thread is not safe for concurrent usage when used on its own and we need to take additional measures if we want to make the code 100% safe (like your method of invoice creation).
Concurrency is a problem for all databases but as I can see there are differences and in this particular case postgres is clearly better. First - the INSERT ... SELECT MAX()... query does not deadlock at all - there is still the danger of two queries selecting the same max value but at least we don’t need to deal with deadlocks. And second - in postgres I can lock the table to make the query fully concurrency safe and the locks - even issued concurrently at high frequency - are much cheaper than in mysql since they do not block other threads completely from accessing the table like mysql does - I don’t know why these databases behave differently but maybe mysql keeps the lock for a little longer than necessary to guarantee its effectiveness.
Yes, in practice we rarely need true concurrency. However, it is good to be aware of what is not concurrency safe so we are not surprised by rare errors and so that we can adjust the code accordingly.
In this case its my bad, I thought we were looking for a work around to solve the issue with creation of invoice ids, while keeping database speed up.
Though the entire case is mute in this scenario as all that is needed is a forced lock while the new invoice id is created.
Not sure I follow you here, if you have two queries trying to insert the same thing into the table I assume one will fail, and that is in reality the same issue MySQL throw the deadlock for in our case.
I didn’t start this thread so this wasn’t my intention to look here for solutions that the OP is not specifically interested in - I just mentioned creating invoice id’s as one of the valid reasons to have sequential id’s at all and wanted to point out the problem with concurrency and test/find out if it is real. But what you posted is certainly related to this topic so it may be useful for some people.
Well, but I think the deadlock is not an issue of two queries trying to insert the same thing. Inserting the same thing may trigger the unique violation error if there is a unique index but not necessarily deadlock - so we are dealing here with two different types of errors that can occur.
Just to be clear, I’m talking about this query:
INSERT INTO tt (invoice_id)
SELECT COALESCE(MAX(invoice_id)+1, 1) FROM tt;
MySQL will throw deadlocks even if there is no unique index and inserting the same thing should be perfectly fine in theory but still the deadlocks occur. Why? I don’t know the internals of MySQL well enough to answer that question. Does it have to do with the auto-increment id? Or maybe with the select statement? Something else? All I know is that Postgres does not throw deadlocks in the same scenario - that’s why I consider its behaviour superior. It may error out on the unique constraint violation but there are no problems if no unique index exists.