Concurrency Freaks: Combining the StampedLock and LongAdder to make a new RW-Lock

Sunday, September 22, 2013

Combining the StampedLock and LongAdder to make a new RW-Lock

On the previous post we showed that LongAdder is Sequentially Consistent (SC) if only the functions increment() and sum() are used, but not SC if the decrement() is also used. In this post we will see what we can do with that knowledge, namely, how to make a reader-writer lock using LongAdder and StampedLock.

One of the main components of a Reader-Writer Lock is the mechanism that "counts" Readers. The simplest of all is a synchronized variable (a volatile in Java, or atomic<> in C++) that is incremented or decremented as a Reader acquires the read-lock.
When this counter reaches zero, it means that all Readers have completed, and a Writer may now enter, after it has "blocked" new Readers somehow. There are many variants of this mechanism but they all use an SC counter. The problem with using a single variable as a counter is that it creates a lot of contention on a single cache line when the multiple Readers increment/decrement the counter, so a much better solution is to use some kind of "distributed counter"... the main requirement being that it must be sequentially consistent.

The simplest approach is to use something like the DistributedCacheLineCounter (DCLC) that we talked about in a previous post:
http://concurrencyfreaks.blogspot.co.uk/2013/08/concurrency-pattern-distributed-cache.html

There are two problems with this approach though:
1st, is that the number of collisions will have a statistical distribution (think hashtables), and the chance of having collisions is not too high, but it will happen, and a collision of two threads is enough to kill the performance for those two threads. If more than two threads collide, then it is even worse.
2nd, it consumes a non negligible amount of memory. To have any decent performance under high contention, you'll need an array with at least the same number of entries as the number of cores, and even if you have just 32 cores, multiply that by the padding that is needs so that each entry is on a different cache line, and you get 2kB per lock instance, which may not be much, but it depends on your application. If your app has a thousand lock instances, that grows to 2MB, and what if you have a thousand cores, or millions of locks?.... I've never seen millions of locks in a code base, but tens of thousands of lock instances seems common-place to me.

LongAdder takes care of the disadvantages mentioned above, not totally, but it in terms o memory saving it is better, so it is worth thinking about it for a bit to see if we can come up with a way to use LongAdder in a lock instead of DCLC. The problem is that LongAdder is not SC... unless we only use sum() and increment(), so the question is:
Can we build a reader-writer lock where the mechanism that counts Readers can only do increment() and sum()?

The answer is yes!
The trick is to use two counters, which we named readersIngress and readersEgress.
readersIngress is incremented whenever a Reader is attempting to "acquire" the shared-lock, and readersEgress is incremented whenever a Reader is "releasing" the shared-lock. The number of active Readers is zero if the value of readersEgress.sum() is equal to readersIngress.sum(). Notice that the order by which the two counters are read is very important: the readersEgress must be read before reading readersIngress, otherwise, by the time readersEgress is being read, a new Reader may have acquired the read-lock and increased readersIngress.

We named this new lock LongAdderRWLock and you can get the source code from the Concurrency Freaks Library here:
https://sourceforge.net/projects/ccfreaks/files/java/src/com/concurrencyfreaks/locks/LongAdderRWLock.java/download
In addition, we did a variant named LongAdderStampedRWLock that combines the LongAdder with a fallback to the StampedLock in case there are a lot of Writers.
https://sourceforge.net/projects/ccfreaks/files/java/src/com/concurrencyfreaks/locks/LongAdderStampedRWLock.java/download

Performance Plots

We ran our usual microbenchmark on a 32-core opteron, and compared it against the StampedLock, ScalableRWLock, and ScalableStampedRWLock. Here are the results for different workloads of Writers vs. Readers:

In terms of performance it doesn't lag behind the "Scalable" version, and it uses a lot less memory in scenarios with a large number of threads, because the memory consumption for the "Scalable" locks is O(NReader-threads), while for the "LongAdder" locks it is at most O(NCores).

Code Simplicity

The thing I really like about the "LongAdder" locks is that the code is very simple, due to the complexity being hidden away in the LongAdder counter itself, which is part of JDK 8, and therefore, has been thoroughly tested.
Don't believe me when I say it is simple? Then here is the code to acquire/release the read-only lock for the LongAdderStampedRWLock:

public void sharedLock() {

while (true) {

readersIngress.increment();

if (!stampedLock.isWriteLocked()) {

// Acquired lock in read-only mode

return;

} else {

// Rollback logical counter to avoid blocking a Writer

readersEgress.increment();

// If there is a Writer, wait until it is gone

while (stampedLock.isWriteLocked()) {

Thread.yield();

}

public void sharedUnlock() {

readersEgress.increment();

return;

}

and here is the code to acquire/release the write lock:

public void exclusiveLock() {

// Try to acquire the stampedLock in write-mode

stampedLock.writeLock();

while (true) {

// Order is _very_ important here

final long egressSum = readersEgress.sum();

final long ingressSum = readersIngress.sum();

if (egressSum == ingressSum) return;

Thread.yield();

}

public void exclusiveUnlock() {

if (!stampedLock.isWriteLocked()) {

throw new IllegalMonitorStateException();

}

stampedLock.asWriteLock().unlock();

}

It doesn't get any simpler than that!

Conclusions

We've shown a new technique for Reader-Writer locks that combines the StampedLock and the LongAdder to make a lock with low memory footprint and high performance. In case you missed it from the plots above, its performance can be much better than the standalone StampedLock, which is shown as the cyan line that is in the bottom of all plots, except the "50% Writes".

Of the two variants, the LongAdderRWLock seems to be the best overall, but because it is based on the C-RW-WP, it has the disadvantage of having a slight Writer-Preference. If you look at the code, you'll see that both of them use a StampedLock, but the LongAdderRWLock is only using it to serialize the Writers, while the LongAdderStampedRWLock is using it as a fallback mechanism when a lot of Writers exist, which means it inherits most of the fairness proprieties provided by the StampedLock.

We're still doing tests and Andreia is working on some interesting variants so lets see what comes out of it.
All of this will be part of the upcoming release of the Concurrency Freaks Library 0.5

7 comments:

AnonymousSeptember 23, 2013 at 1:10 AM
Hello Pedro,

I have updated my RWLock to version 1.1 , i have added another
version of my RWLock , please read the following description:

Description:

A fast, and scalable, and lightweight Multiple-Readers-Exclusive-Writer Lock that is portable called LW_RWLock and that works across processes and threads and a fast and scalable Multiple-Readers-Exclusive-Writer Lock called RWLock that is portable and that don't use spin wait but uses an event object and also my SemaMonitor and that consumes less CPU than the lightweight version and it processes now the writers on a FIFO order and that's also important and it works across threads .

You can take a look at it on the following website:

http://pages.videotron.com/aminer/

Thank you,
Amine Moulay Ramdane.
ReplyDelete
Replies
AnonymousSeptember 23, 2013 at 1:19 AM

Hello Pedro,

I have used my SemaMonitor inside my RWLock and benchmarked
it against the Windows Semaphore and i have found that it is faster than the Windows Semaphore. You will find my SemaCondvar and
my SemaMonitor on the following website:

http://pages.videotron.com/aminer/

Thank you,
Amine Moulay Ramdane.

ReplyDelete
Replies