Saturday, April 20, 2013

Performance plots of the Scalable-Stamped-Read-Write-Lock

Last month we had presented a Scalable-Read-Write-Lock which we made available as part of the Concurrency Freaks library 0.3 available on sourceforge under the name
This rwlock has great scalability proprieties but it is very "unfair" because it has a strong statistical Writer-Preference, and it doesn't perform well when compared with other read-write locks for a large number of Writes, like for example the StampedLock coming in on JDK 8.
In fact, the StampedLock provides a superior performance to the ScalableRWLock and many extra features (except for reentrant/recursive).

We felt that having such an unbalanced performance was a big setback for the ScalableRWLock, and we thought of it for a while and come up with an improved version that uses both the Scalable mechanism and a StampedLock, taking the best of both worlds.
We called it the ScalableStampedRWLock... a long name for a simple idea. The main change consists of replacing the CAS loop on the exclusiveLock() with a call to StampedLock.writeLock(), and on the sharedLock(), instead of yielding when there is a Writer, the Read will go directly to StampedLock.readLock().

A friend of ours at EPFL was kind enough to lend us its 80 core-server so that we could run our benchmark on it (big thanks to Miguel Branco!) and the results are quite good.


When we run a test where each threads does (on average) one Write for each Read operation, we can see that the ScalableRWLock performance decreases as the number of threads increases. There are  two reasons for this: first, the more (Reader) threads we have, the larger is the list that a Writer has to scan to decide whether or not it is safe to proceed; second, with a larger list there is a higher probability that one of the Readers in the list will cause the Writer to wait/yield.
Overall, the Scalable mechanism is very sensitive to Writes because of its Writer-Preference, and this puts it into a large disadvantage when the number of Writers is large.

For 20% Writes we can see a similar plot to 50% Writes, but now the difference between the ScalableStampedRWLock and the StampedLock is minimal.

For 10% Writes we can already see that sometimes there is an advantage in using the ScalableStampedRWLock.

For 2% Writes the advantage can go up to twice the performance (threads = 2) and it never goes below the performance of the StampedLock

For 0.5% and 0.1% Writes the advantage of the ScalableStampedRWLock (and ScalableRWLock) is clear.

In the extreme case of only very occasional Writes (0% Writes to be more precise) then the performance of both the ScalableStampedRWLock and the ScalableRWLock grows linearly with the number of threads, while the StampedLock remains nearly constant.

Notice that the horizontal axis is not a linear scale and that is why the green and blue lines seem to grow exponentially, when in fact the increase in performance is linear with the number of threads.

All together

We did two other plots where we compare for 4 threads and 64 threads what happens with a different number of read/write operation ratio (average):

The plot for 4 threads has a linear vertical scale and the 64 threads test has a vertical logarithmic scale otherwise we would not be able to see the performance of the leftmost test-cases.

It is easy to see that the green bar is always one of the highest (or the highest) in all situations, losing at most 40% to the highest (the StampedLock), thus showing that the ScalableStampedRWLock is the dominant strategy when it comes to performance.

Keep in mind that unlike the StampedLock, the ScalableStampedRWLock does not provide a "Stamp mechanism", and neither does it support an "Optimisitic Reading" tryOptmisticRead() call, but if your applications doesn't need those functionalities and performance is important to you, then we believe that the ScalableStampedRWLock is the way to go.