Monday, October 28, 2019

Is RCU a generic concurrency control?

In this post we're going to talk about RCU and we're going to start by what it means.
When people talk about RCU they usually mean two different things combined: a synchronization mechanism, which I'll call RCU-sync, and a make-copy-of-object-and-modify-copy, which I'll call Copy-On-Write (COW).

Let's start with Copy-On-Write.
This is a technique used in concurrency and even on durability (see previous post) where instead of modifying an object in-place, you make a copy of the object, modify the new object and then toggle a pointer to point from the old object to the new object. The name RCU comes from Read-Copy-Update as a reference to this technique.
When used in (shared memory) concurrency this creates the problem of what to do with the old object: is it safe to delete it immediately? What if there are other threads currently accessing it?
Solving this problem of object lifetime tracking is what the synchronization mechanism of RCU (RCU-sync) does.

There are many different algorithms that implement an RCU-sync. Some examples of RCU-sync algorithms can be seen in Paul McKenney's excellent book or in our paper about fast userspace RCUs. The thing about an RCU-sync is that it's a concurrency construct, like a mutual exclusion lock or a reader-writer lock. For example, a mutual exclusion lock implements at least the following API:
  lock()
  unlock()

while a reader-writer lock implements at least:
  read_lock()
  read_unlock()
  write_lock()
  write_unlock()

and an RCU-sync implements at least:
  rcu_read_lock()
  rcu_read_unlock()
  rcu_synchronize()

The difference between RCU and locks is that locks provide a generic way of doing concurrency, while RCU does not.

The rcu_read_lock() and rcu_read_unlock() methods must guarantee that code within a critical section defined by these two functions must not escape outside the critical section. An RCU-sync must also guarantee that a call to rcu_synchronize() will wait for all threads currently executing a block of code in a rcu_read_lock/unlock() block to complete (call rcu_read_unlock) before rcu_synchronize() returns to the caller. This guarantee is called quiescence and it's extremely useful in shared memory concurrency. Note that the actual semantics are much more formal that what I've described here, but for the purpose of this post, it's enough to get an idea of how an RCU-sync behaves.

As mentioned above, RCU-sync provides quiescence, not mutual exclusion.
A thread calling rcu_synchronize() will know that by the time the call to rcu_synchronize() returns, everything done before this call is now visible to other threads (as long as the other threads call rcu_read_lock/unlock to access the associated data). However, there is nothing preventing other threads from modyfing the same data at the same time (data race behavior). This means that even when using RCU-synch we still need some extra mechanism to provide mutual exclusion or prevent races in some other way.

At its essence, RCU-sync is mechanism for a single writer to synchronize with multiple readers. It doesn't solve the problem of multiple writers and it also doesn't solve the problem of the readers having a consistent view of the data.
Let me explain a bit more about what I mean with "consistent view". Suppose we have two variables, "a" and "b" initially at zero and a single writer thread and multiple reader threads. The writer code could be:
writer() {
    a = 1;
    rcu_synchronize();
    b = 1;
    rcu_synchronize();
}

and the reader code:
reader() {
    rcu_read_lock();
    printf("a=%d\n", a);
    printf("b=%d\n", b);
    rcu_read_unlock();
}


A reader thread could print "a=1,b=0", even if "a" and "b" are std::atomic with memory_order_consume, or even memory_order_seq_cst.
RCU-sync does not provide atomicity for multiple variables. In order to achieve this, we need to aggregate all those variables within a single object and make a snapshopt of this object using Copy-On-Write. But then, the concurrency technique really is COW, not RCU-sync. With such an approach, the RCU-sync is used only for memory reclamation of the old objects and if we're in Java then we don't even need that because the GC will do it for us.
And that is why whenever they use copy-on-write on java.util.concurrent they call it copy-on-write and not RCU  ;)


Let me repeat it again: RCU-sync is extremely useful for memory reclamation and as a component of more complex concurrency techniques, but it is not generic. RCU-sync does not provide atomicity, nor does it prevent write-write races, not even read-write races. All it does it give quiescence, which personally I think it's awesome! It's all a matter of expectations.

Just to go back to COW briefly, COW can be used to solve write-write races and read-write races by creating a new object on every modification of every sub-object. Basically, if you want full consistency across your data, you need a god object that contains all your data and any modification to the data (or just a single variable) will require a full copy of the entire data.
So yes, COW can be generic, it's just that nobody does it like this because it would be too slow.

Both COW and RCU-sync are valuable tools that researchers and concurrency library implementers can use to make sophisticated concurrent data structures or other synchronization mechanisms, but calling them generic concurrency controls is going too far.
In other words, there certainly are database engines and STMs (software transactional memory) which use COW or RCU-sync internally, but you'll never see a DBMS or STM whose concurrency control is RCU. It just doesn't make sense to have a DBMS where any mutative operation causes a copy of the entire DB to be made. Because, that would be the only way to provide generic transactions on that DBMS or STM.
In the DBMS context, a concurrency control is something like MVCC (multi-version concurrency control) or 2PL (two-phase locking) or OCC (optimistic concurrency control).

RCU is not a generic concurrency control.

No comments:

Post a Comment