Tuesday, December 5, 2023

Why is Snapshot Isolation not enough?

 

Let's talk about why Snapshot Isolation is not enough for a general purpose concurrency control.

 

Snapshot Isolation has been adapted by several major database vendors however, it is rarely used outside of Database Management Systems… Why is that?

According to the wikipedia, Snapshot Isolation (SI) is a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice it reads the last committed values that existed at the time it started), and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot.

 

First of all, the original definition of SI didn't provide a guarantee that the snapshot is done for a time instant during the transaction, it could be a snapshot of the data taken seconds, minutes or days before the transaction starts. But in practice, this is silly so let's go with the wikipedia definition which is a lot more reasonable.

 

 

SI can't implement simple data structures

Suppose we have a doubly-linked list with four nodes, of the form:

 

Two threads are executing transactions, with one attempting to remove node B and the other thread attempting to remove node D.

Each thread has its own (logical) snapshot of the data structure.

The first thread will set A->next to C and C->prev to A.

The second thread will set B->next to D and D->prev to B.

 

Notice that there is no lock conflict between these two operations because they are not modifying the same records. The first transaction modifies fields in records A and C, while the second transaction modifies fields in records B and D.

Under Snapshot Isolation both transactions will commit, which means that the doubly linked list will be left in an inconsistent state:

A->next is C

B->next is D

B->prev is A

C->next is D

C->prev is A

D->prev is B

 

This means that if a traversal is done from left-to-right, it will see the following nodes:

A->C->D

but if it is done from right to left it will see:

A<-B<-D

Both are wrong because C and B were logically removed from the list, but together they are complete garbage.

 

This shows that even a simple data structure, like a doubly-linked list, will lose its invariants when operations modify it under Snapshot Isolation.

I used to think that Snapshot Isolation was bad because the brains of most developers were not capable of handling the repercussions of this consistency model, but it turns out, the fault is not due to our punny human brains being unable to cope with forms of relaxed consistency, it's really about Snapshot Isolation being completely useless for general purpose code, such as data structures.

Don't misunderstand, it's usually ok to use SI for raw data (a counter example is the black-pebble-white-pebble problem by Jim Gray), but as soon as you have some invariants or pointer de-reference in your transaction, it's just going to give bad results, and it has nothing to do with our mental ability to handle relaxed consistency.

 

 

SI can't implement Transactions

When it comes to concurrency, I work a lot with the concept of transactions. I like to use it because a transaction is an intuitive way to express the intent of all-or-nothing. At its core, a transaction is a way for the developer to say:

I want this group of operations to be all-or-nothing.

In other words, the operations inside the transaction must appear as if they are all executed together, to each other and to other transactions. It's actually a bit more than this, but for the purpose we're aiming at now, it will suffice  ;)

 

The above is my own definition, but let me argue for it.

If we are grouping operations together and don't care about whether some are seen and others are not, then it's just a batch-of-operations, not a transaction. For example, suppose the developer expresses this intent:

beginTxn();

  execute_operation_A();

  execute_operation_B();

endTxn();

and

beginTxn();

  assert (operation_A_is_done() == operation_B_is_done());

endTxn();

On a batch-of-operations the assert could trigger, but on a transaction the assert will never trigger.

 

The problem with SI is that everything that is read in the transaction is done on a snapshot of the data and therefore it is all-or-nothing, but what is written is done at a much later time.

At first sight, this may seem innocuous enough, but it's problematic even on simple situations.

Take the following example, where x and y and two variables which start at zero and two transactions are being executed concurrently in two separate threads:

// Thread 1

beginTxn();

  if (x == 0 && y == 0) x = 1;

endTxn();

 

// Thread 2

beginTxn();

  if (x == 0 && y == 0) y = 1;

endTxn();

 

Notice that under a serializable history, only the x or the y will be changed from zero to one, but never both.

However, under SI, both transactions will be reading a snapshot of the data with x and y (where both are zero) and both transactions will be writing to these variables independently, which means both transactions commit.

 

In other words, the writes of the transaction and the reads of the transaction are each all-or-nothing, but they are not all-or-nothing with each other. It's as if the transaction is split into two separate transactions, where the read operations occur sometime in the past while the write operations occur now.

I don’t know about you, but this doesn't seem to me like what a transaction is supposed to do, which lead us to the next section.

 

 

SI can't express simple conditionals

Even a simple conditional can't be expressed in SI, if the conditional implies a write.

Take for example:

  if (c == 0) d = 1;

We're reading a variable c and writing to a variable d. It's pretty straightforward. However, under SI we are reading the variable c sometime in the past (we don't know exactly when) and writing to d now.

What if c is no longer at zero? SI may say it is still zero and set d to 1.

In other words, what SI allows us to express is not whether c is currently at zero, but whether c was zero at some point in the past. Put another way:

You want to express this:

  if (c == 0) d = 1;

but SI gives you this:

  if (c_was_zero_at_some_point_in_the_past()) d = 1;

 

See the problem now?

 

 

Who cares about SI?

Well, if we could make something consistent, anything really, out of Snapshot Isolation, then it would be really cool because SI allows for the implementation of concurrency controls which never restart. These are sometimes called zero-restart, or abort-free, or conflict-free concurrency controls and, if such a thing was possible with Serializability and DAP (Disjoint Access Parallelism), it would be one of the holy grails of concurrency.

Unfortunately, it is not possible to have DAP zero-restart concurrency controls with serializable isolation (nor higher forms of isolation). 

I believe the strongest isolation we can provide with the zero-restart property is Snapshot Isolation, however, as we've seen through this entire post, SI is useless for many applications, or at least it's useless the applications that I find interesting.

I don't know of anyone that has proven that DAP zero-restart concurrency controls are impossible with serializability, but I'm 100% sure of this impossibility result.

 

Without going too deep into details, having zero-restarts means we need to take a global lock (or some other global serialization mechanism) so that each operation/transaction executes one at a time, to ensure there are no conflicts, because if a conflict occurs, it will trigger a restart. Granted, not all conflicts need to be solved with a restart, but some of them need to (AB-BA are the simplest example), and if we want a general purpose concurrency control, which works in any situation, then the only way to guarantee zero-restarts is to take that global lock at the beginning of the operation. And, as everyone knows, taking a global lock will serialize all operations, which means no parallelism, which means no scalability.

 

Ultimately, Snapshot Isolation is interesting, but not useful for general purpose concurrency controls.