Sunday, September 1, 2013

Distributed Cache-Line Counter Scalable RW-Lock

In this post, we're going to talk about a variant of the Scalable RW-Lock that uses a Distributed Cache-Line Counter (DCLC) to measure the number of active Readers.

For those of you that don't know what the Scalable RW-Lock is, it's a Reader-Writer lock that Andreia and I discovered, and was simultaneously discovered by a team at Oracle Labs which they published under the name "NUMA-Aware Reader-Writer Locks". And if you're wondering what a DCLC is then you can take a look at this post:
http://concurrencyfreaks.blogspot.com/2013/08/concurrency-pattern-distributed-cache.html

Compared to the "classic" mechanism of the ScalableRWLock, this technique doesn't need to use finalize()rs nor a ConcurrentLinkedQueue, and no ThreadLocals... well, for the reentrant version of the lock we need a ThreadLocal, but we don't need to register a finalize() for that.

The idea is to use the DCLC to measure the number of Readers and then the Writer that wins the election among the competing Writers will just have to set its state to WRITING and then scan the array of the DCLC and wait for it to be zero. Moreover, it doesn't need to restart from the first position of the array if one of the intermediate positions is non-zero, it just needs to wait until that position is zero, and then advance to the next one.

We named this new reader-writer lock variant DCLCRWLock. One of its advantages is that it is very easy to implement in any language because it doesn't need the ThreadLocals or a ConcurrenLinkedQueue. Another advantage is that the Reader threads don't need to "register" the first time they try to acquire the lock, which means that it is ideal for applications where each thread seldom acquires the lock.


Disadvantages:
- Not reentrant, but can be made reentrant with the usage of a ThreadLocal
- As with all hash-function based algorithms, if the IDs of the threads that are trying to acquire the lock in shared mode end up in the same bucket, there will be a lot of contention in a single entry of the DCLC array and the lock won't scale. Statistically, the number of collisions should be small and, therefore, the contention low, which implies a high scalability with the number of Readers.

Advantages:
- No need for ThreadLocals
- No need for a ConcurrentLinkedQueue
- Works nicely for applications with a lot of thread destruction/creation (or threadpools)
- Easy to implement in several languages, as long as they have a Sequentially-Consistent Data-Race-Free Memory Model and an atomics API.
- Unlike the other variants of the Scalable RW-Lock, this one doesn't need a GC.



Performance:
Here are some performance plots for the Java version compared with the ScalableStamped, Stamped, ScalableRWLock.
As you can see, its results are similar to the ScalableRWLock, and in some cases seem even better (but could be variation in the tests because we did just one run), although in in some pontual cases the performance drops significantly, my guess is that its due to collisions of the hash function, as was to be expected.














Source Code:
Here is the link to the source code, but it will be part of the next release of the Concurrency Freaks library on sourceforge.
DCLCRWLock.java