During one of our late night brain-storms, Andreia suggested using regular arrays instead of AtomicInteger for the counter of the number of reentrant calls of a ScalableRWLock.
The idea is that each thread only cares about its own number of reentrant calls, therefore, when a Writer does a scan for the state of the Readers it doesn't care whether the Reader is in reentrant level 1 or level 100, it still has to wait. The Reader can save some synchronization calls if it updates another array (with cache-line padding) where each entry is updated only by the assigned thread because all the Writer needs is a 0/1 variable to see if it needs to wait for that particular Reader or not.
I've done a Java implementation, and compared it against the version that is in Concurrency Freaks Lib v0.1, and the plots can be seen below.
For a single thread test, without any writes, and the operation consists of reading an array of 16 int:
For a two-thread test, without any writes, and the operation consists of reading an array of 16 int:
Notice that the horizontal axis depicts the number of reentrant calls that were performed per operation. In practice, for most common applications, although the size of the stack can become quite large, the number of reentrant calls shouldn't go above 10 or 20. If it is, then IMHO there is something wrong with the design of that application ;)
In the single-threaded case there is a clear advantage from the
java.util.concurrent.locks.ReentrantReadWriteLock, but for the case with two threads the situation reverses, and the Scalable algorithms are always better.
I think this advantage is only for Java because the ReentrantReadWriteLock uses the synchronized mechanism which is pretty sophisticated, and I think it is able to figure out that it doesn't need to do as much sychronization as when it has multiple threads. I'll have to repeat the tests on C or C++ to confirm that in those languages the performance of the classical rw-locks should always be inferior to the Scalable mechanism.
In conclusion, if you are using Java and have low contention and lots of reentrant locks, then using the ReentrantReadWriteLock will provide you with decent performance, but if you have high-contention, the Scalable RWLock algorithms will give you much better performance with or without a large level of reentrancy.