Wednesday, February 27, 2013

Optimizing Reentrant calls

During one of our late night brain-storms, Andreia suggested using regular arrays instead of AtomicInteger for the counter of the number of reentrant calls of a ScalableRWLock.
The idea is that each thread only cares about its own number of reentrant calls, therefore, when a Writer does a scan for the state of the Readers it doesn't care whether the Reader is in reentrant level 1 or level 100, it still has to wait. The Reader can save some synchronization calls if it updates another array (with cache-line padding) where each entry is updated only by the assigned thread because all the Writer needs is a 0/1 variable to see if it needs to wait for that particular Reader or not.

I've done a Java implementation, and compared it against the version that is in Concurrency Freaks Lib v0.1, and the plots can be seen below.

For a single thread test, without any writes, and the operation consists of reading an array of 16 int:


For a two-thread test, without any writes, and the operation consists of reading an array of 16 int:


Notice that the horizontal axis depicts the number of reentrant calls that were performed per operation. In practice, for most common applications, although the size of the stack can become quite large, the number of reentrant calls shouldn't go above 10 or 20. If it is, then IMHO there is something wrong with the design of that application  ;)

In the single-threaded case there is a clear advantage from the
java.util.concurrent.locks.ReentrantReadWriteLock, but for the case with two threads the situation reverses, and the Scalable algorithms are always better.
I think this advantage is only for Java because the ReentrantReadWriteLock uses the synchronized mechanism which is pretty sophisticated, and I think it is able to figure out that it doesn't need to do as much sychronization as when it has multiple threads. I'll have to repeat the tests on C or C++ to confirm that in those languages the performance of the classical rw-locks should always be inferior to the Scalable mechanism.

In conclusion, if you are using Java and have low contention and lots of reentrant locks, then using the ReentrantReadWriteLock will provide you with decent performance, but if you have high-contention, the Scalable RWLock algorithms will give you much better performance with or without a large level of reentrancy.



2 comments:

  1. Just for clarification on the graphs :
    ScalableReentrantRWLockS - has a static array and threadids are shared between locks.
    ScalableReentrantRWLockS (new) - the same has ScalableReentrantRWLockS but uses a separated array of ints that stores the number of reentrant locks.
    ScalableReentrantRWlock - uses the finalise() technique with a ConcurrentLinkedQueue to store the pointers of the ReadersEntry for writer scanning.

    Nice work Pedro. ;)

    ReplyDelete
  2. Another possible optimization for the ScalableReentrantRWlock is to have two variables on the ReadersEntry one that is an AtomicInteger that can take only 0 or 1 and a counter int that counts the number of reentrant locks, this minimises the invalidation of cache lines for the AtomicInteger and replaces an atomic set for a ++. Since the writer doesn't care about the number of reentrant read locks.

    ReplyDelete