Wednesday, October 30, 2019

How about RLU? Is RLU a generic concurrency control?

On the previous post we went over why RCU is not a generic concurrency control. This time we'll talk about RLU (it's an L, not a C):
http://alchem.usc.edu/courses-ee599/downloads/rlu-sosp15.pdf

RLU is a synchronization technique which contains inside it an efficient userspace RCU.
It supports multi-atomic updates, making them visible in one-shot to the readers, without having to make an replica of the entire data, like on RCU.

The idea of RLU is that during a write operation, each modified object is first copied into a per-thread log, then it makes the modifications to the object in the log. When all modifications have been made, it increments the global clock, thus making all modifications atomically visible to readers. I'm over-simplifying, but it's ok for this post.

RLU is certainly more generic and efficient than RCU but it does not allow for generic code to be used. In fact, RLU does write-read synchronization but it does not handle write-write synchronization and the the paper itself mentions this explicitly:
"The basic idea of RLU described above provides read-write synchronization for object accesses. It does not however ensure write-write synchronization, which must be managedby the programmer if needed (as with RCU)".


To better understand why RLU is not a generic approach to concurrency, let's consider the canonical example of two threads 1 and 2 with cross dependencies. They are attempting to execute the following operations, where x and y start both at false:
Thread 1:
  if (!x) y = true;

Thread 2:
  if (!y) x = true;

 
As I'm sure you're aware, any serializable execution of these two operations will either result in x and y being {true,false} or {false,true} but never {true,true}. One of the operations has to "happen before" the other. If T1 executes before T2 then the end result will be {false,true} and if T2 executes before T1 then the end result will be {true,false}.
If you run this code without modification on RLU, it can happen that T1 reads x and sees false while T2 reads y and sees false, then T1 acquires the lock on y while T2 acquires the lock on x and each write to their log that the variables will be true and each advance the global clock and commit. In case you're wondering, this scenario is not linearizable.






In the database world, this kind of problem is called a write skew anomaly (see section 2.4 of the MV-RLU paper for a great explanation in the context of RLU). In the context of concurrent data structures, we call these bugs!
Handling cases like this is what an STM or a Universal Construction are supposed to do because an STM/UC must handle any sequential piece code (with some exceptions).
Granted, if you're really careful how you write the code, you can use RLU as long as you avoid writing sequential code that has this kind of cross dependencies. The problem is, in practice, this is very hard to do and even when you do it, you're never really sure it's actually done correctly (did I miss some cross dependency?). If you write user-code that has one of these problems, it will be hard to find it and likely hard to fix it.

There is an easy solution for RLU and that is to use a global lock, but then it looses some of its appeal because it no longer scales for writes.
Keep in mind though that RLU scales for read-only transactions!

STMs like TinySTM or TL2 have time-based algorithms that keep a read-set of all loads done within the transaction and do validation of the read-set at commit time exactly to handle these kind of cross dependencies.
If you're wondering how frequent are cross dependencies in real code, then even an ordered linked list is prone to these kind of issues (depending on how it is implemented), which helps to show that it's very hard to get away from this problem in practice.

If you ever read a paper where they say that they used RLU to make an STM, be very very suspicious: an STM must be able to handle cross dependencies and provide linearizable (globally serializable) operations regardless of what the underlying sequential user-code. If it works only for some small subset of user programs, then it's not an STM... sure it's a very valuable tool, but it's not an STM.

No comments:

Post a Comment