I will use English here because I am going to copy this to MSDN later on.
So.
4 out of 5 projects, which I have been involved in during last couple of years, used some kind of loop to retry in the event of database deadlock. Which made me thinking: "Why do we have to repeat this over and over again in the client code?"
It seems that we have a classic case of successful problem externalization. Seems like database architects decided not to deal with this [difficult without any doubt] issue, making it cient problem.
Both most popular database engines (Oracle, Sql Server) return deadlock exception as soon as it is detected. Which in typical scenario will be sooner than the timeout [duh! otherwise it would be timeout exception, not a deadlock one]. Default time frames for Sql Server are 5 seconds for deadlock, 30 seconds for timeout. Which means the server could try 6 times to rerun deadlocked transaction before giving up. This is a simplest thing to do. Not very smart though, but it would already help.
But, to an outsider (me), it seems that server could do more. Server detects deadlocks by walking the list of locks held by suspicious transaction[s]. Instead of rolling back all the way, it could roll back to the point just before the first deadlocked resource was locked. Wait a little (meanwhile the other transaction will go through; it can even be tracked) and retry. I am sure this is all much more complicated internally, but I don't see anything impossible here.
Anyway, even simple retry of the whole transaction would help. I am sure that Katmai is already too late in the cycle to add new feature, but it should be at least discussable for vNext + 1 version. What do you guys think?
So.
4 out of 5 projects, which I have been involved in during last couple of years, used some kind of loop to retry in the event of database deadlock. Which made me thinking: "Why do we have to repeat this over and over again in the client code?"
It seems that we have a classic case of successful problem externalization. Seems like database architects decided not to deal with this [difficult without any doubt] issue, making it cient problem.
Both most popular database engines (Oracle, Sql Server) return deadlock exception as soon as it is detected. Which in typical scenario will be sooner than the timeout [duh! otherwise it would be timeout exception, not a deadlock one]. Default time frames for Sql Server are 5 seconds for deadlock, 30 seconds for timeout. Which means the server could try 6 times to rerun deadlocked transaction before giving up. This is a simplest thing to do. Not very smart though, but it would already help.
But, to an outsider (me), it seems that server could do more. Server detects deadlocks by walking the list of locks held by suspicious transaction[s]. Instead of rolling back all the way, it could roll back to the point just before the first deadlocked resource was locked. Wait a little (meanwhile the other transaction will go through; it can even be tracked) and retry. I am sure this is all much more complicated internally, but I don't see anything impossible here.
Anyway, even simple retry of the whole transaction would help. I am sure that Katmai is already too late in the cycle to add new feature, but it should be at least discussable for vNext + 1 version. What do you guys think?
(no subject)
Date: 2007-08-12 08:19 pm (UTC)The problem of deadlocks is a problem of system design. If all transactions are serialized - ALL access T1 first and T2 second then there will be no deadlocks.
I have couple enterprise systems for a few years: no deadlocks so far.
(no subject)
Date: 2007-08-12 08:37 pm (UTC)It is arguable whether it is easier to build deadlock-free or deadlock-tolerant solutions. But it's not a point here. Both are the efforts to solve inherently server problem that was externalized because seemed too difficult at the time.
(no subject)
Date: 2007-08-12 09:08 pm (UTC)I type russian slow :)
(no subject)
Date: 2007-08-13 06:22 pm (UTC)on the other hand, if the original transaction is "read committed", it is fine to see new committed data after rollback; it [different data in different parts of the same transaction] could happen even without the rollback since we didn't request this level of consistency in the first place.
(no subject)
Date: 2007-08-13 08:02 pm (UTC)If your business logic allows Nonrepeatable and Phantom reads then automatic retry of transaction in case of deadlock is possible, but I think of it more as exception.
I think SQL teams didn’t create auto deadlock handling to avoid different behavior when different isolations used.
(no subject)
Date: 2007-08-15 08:01 am (UTC)ни одна из транзакций ещё ничего не закоммитила в момент дедлока, так что ни о какой зависимости одной транзакции от другой речь идти не может.
If your business logic allows Nonrepeatable and Phantom reads then automatic retry of transaction in case of deadlock is possible, but I think of it more as exception.
и nonrepeatable и phantom reads - это артефакты, возникающие в момент, когда одна транзакция коммитит данные, с которыми работает другая транзакция. поэтому тоже совершенно мимо кассы.
давайте вы напишете что-нибудь более конкретное - какой isolation level вы имеете в виду, и какие транзакции, для примера, в которых partial rollback создаст проблемы, иначе отсутствующие. ну чисто чтобы рассеять мои сомнения в том, что вы понимаете, что говорите.
(no subject)
Date: 2007-08-15 08:13 am (UTC)in case of auto retry it is quite simple. any logic that needs consistency should be inside the transaction, otherwise consistency can not be guaranteed. so if we rerun the whole transaction, there is nothing to worry about.
again, it would be very useful to see some concrete examples.
(no subject)
Date: 2007-08-16 02:16 pm (UTC)Transaction isolation level for both transactions are READ COMMITTED
Transaction 1 Tr1
BEGIN TRANSACTION
DECLARE @var1 , @var2
1. SELECT @var1 = COUNT(*) FROM TabA
…
2. UPDATE TabA SET WHERE id = 123
…
3. SELECT @var2 = COUNT(*) FROM TabA
…
4. UPDATE TabB SET WHERE id = 456
COMMIT TRANSACTION
Transaction 2 Tr2
BEGIN TRANSACTION
1. UPDATE TabB SET WHERE id = 456
…
2. INSERT INTO TabA (id) VALUES (124)
COMMIT TRANSACTION
Lets examine the execution.
Tr1 starts
Tr1 step 1: shared lock held for duration of statement and released short after statement finished, value of @var1 is N
Tr1 step 2.
Tr2 starts
Tr2 step 1
Tr1 step 3 value of @var2 is N
Tr1 step 4: lock wait (resource held by Tr2 step 1
Tr2 step 2: lock wait (resource held by Tr1 step 2, because records with id 123 and 124 are on same page), deadlock.
At this moment value of @var1 and @var2 are same
Lets assume now that server can partially rollback transaction just before locks, wait some time for another transaction to commit and then roll forward.
Tr1 rolled back just before Tr1 step 2
Tr2 committed
Tr1 rolled forward
Tr1 step 2.
Tr1 step 3 value of @var2 is N +1
Tr1 step 4
At this moment value of @var1 and @var2 are different
Tr1 committed
My point is if there is no deadlock (and partial rollback) value of @var2 will be always same as value of variable read just before UPDATE statement (Tr1 step 2) guaranteed by TabA having exclusive lock acquired at step 2 till the end of transaction.
(no subject)
Date: 2007-08-16 11:47 pm (UTC)if you, on the other hand, ensure that @var1 won't change by using "select @var1 ... with (xlock, holdlock)", then partial rollback will work just fine.
I believe it is not possible to find the case where partial rollback will introduce incosistency that was not there before, but I will enjoy playing this game (you provide the example, I counter it) as long as you do.
(no subject)
Date: 2007-08-12 09:11 pm (UTC)(no subject)
Date: 2007-08-13 04:47 am (UTC)(no subject)
Date: 2007-08-13 06:13 pm (UTC)(no subject)
Date: 2007-08-13 04:47 am (UTC)(no subject)
Date: 2007-08-13 08:05 am (UTC)Or is it not allowed to mention this? ;)
(no subject)
Date: 2007-08-13 06:11 pm (UTC)(no subject)
(no subject)
Date: 2007-08-13 06:29 pm (UTC)I am local search, you know.