Tuesday, February 18, 2014

Left-Right: Reader's Version variant

We talked about the Left-Right pattern before, and on this post we will cover the "Reader's Version" variant.

In this variant of the Left-Right algorithm, each Reader thread has its own version, which it increments and publishes, which we name readerState. This means this technique can not be used with counter-based readIndicators, only with readIndicators that have a separate state for each of the Readers.
This has implications to memory consumption, because the counter-based readIndicators (like AtomicLong, LongAdder and DistributedCacheLineCounter) and a memory usage of O(1) or O(Ncores), while the readIndicators with a separate state per Reader have a memory usage of O(NReaders) which is usually O(Nthreads). Having a memory consumption of O(Nthreads) is not an issue if you have 8 cores and 8 running threads, but if you have thousands of threads and only 8 cores, then it may become something to think about. Still, keep in mind that each Reader's state is composed of a single 64 bit counter, or at most a single cache line to prevent false-sharing. In Java, this can be done using @sun.misc.Contended:

    public static class State {
       // Possible states for this variable are:
       // -1:                Not reading, initial state
       // Negative number x: Not reading, last version was x
       // Positive number x: Currently reading with version x
        private volatile long readerState;
        public State() {
            this.ReaderState = NOT_READING;

To help distinguish inactive Readers from active ones, we need to use one of the bits of the readerState to encode the states READING / NOT_READING. The easiest way to achieve this is to use the sign of the long variable, where negative means NOT_READING and a positive version represents a READING state.
Before starting the read operation, a Reader will change its readerState from negative to positive and increment it by one. Once the read operation is finished, the Reader will change the sign of the readerState from positive to negative.

Here's an example of how the readerState variable evolves with multiple read operations:
- readerState starts at -1
- Reader1 begins and sets readerState to 2
- Reader1 executes its read operation
- Reader1 finishes by setting readerState to -2
- some time passes
- Reader1 begins a new operation and sets readerState to 3
- Reader1 executes its read operation
- Reader1 finishes by setting readerState to -3

As for the Writer, it has to scan the readerState of each of Readers, waiting for an increment or for the version to be set to a negative value, before it can proceed with the write operation.
Notice that each of the Readers has an independent "version number", with no relation with the others. It also means that when the Writer sees a positive number, it must wait until one of two things occurs:
- The readerState changes to a negative number: the Reader has finished its read operation and has set its readerState to a negative value;
- The readerState is incremented: the Reader has finished its read operation and has started a new one (or several) read operations, which means the Reader is guaranteed to have seen the new value of the leftRight variable and therefore, it is executing its operation on the opposite data structure to the one the Writer is wishes to modify.

One theoretical limitation of this approach is that, the variable for the state of each Reader is continuously incremented and could eventually overflow. If a 64 bit integer is used for readerState, it should take many thousands of years for a current modern CPU to be able to overflow it, thus avoiding this issue in practice.

An interesting optimization of this technique on non-x86 architectures is to use relaxed atomics, namely, the Reader doesn't need to do an acquire-barrier when reading the variable readerState because it is the only thread capable of modifying this variable. It does need to issue a release-barrier when modifying this variable because it needs to ensure that the modification is visible to the Writer. On x86 architectures the acquire-barrier has a negligible cost, so there is no point in doing this optimization.

Sample source code in Java can be seen here: