Today is just a revisiting of some old stuff.
We've talked before about scalable reader-writer locks, more specifically, DCLC RW-Lock which is one possible implementation of the C-RW-WP algorithm mentioned in the paper "NUMA-Aware Reader Writer Locks" which we had arrived at shortly after the Oracle team got their paper accepted.
I had to re-do some plots of the C++1x implementation of DCLC RWLock for an internal presentation at my day job, so I might as well post them here for all to see :)
There are four plots, with 50% Writes, 10% Writes, 1% Writes and 0.1% Writes, all done on a 32-core AMD Opteron machine (x86):
The first thing to notice is that DCLCRWLock is always better than pthread_rwlock_t. Although the performance decreases as the number of write-modify operations increase (due to the increase in the number of threads) the total number of operations per millisecond decreases because the window to do read-only operations is shorter and because we can only do one write-modify at a time. That's why on the first two plots the performance decreases as the number of threads increase.
The second thing to notice is that DCLCRWLock can provide a throughput of up to 320x more than pthread_rwlock_t. Yep, two orders of magnitude difference, all thanks to reducing contention and false sharing on the Reader threads.
To be fair, the memory usage of DCLCRWLock per instance is waaaaayyy higher than a simple pthread_rwlock_t, but as all good engineers are aware of, there are no silver bullets, only tradeoffs and different tools for different purposes.
As always, the source code is available on github: