Saturday, June 21, 2014

Elected Pattern

Enough about locks for the time being. Today we're going to talk about a concurrency technique named the "Elected Pattern".
It is based on a very simple idea that is known in distributed systems as "Leader Election", where one node is assigned a special "role", like for example, the leader or server in a network of nodes.

When applied to concurrency, we use threads instead of nodes, and tasks instead of roles, but the idea is pretty much the same.
Up until Andreia showed me one of her data structures using it more than a year ago, I hadn't seen this being applied to a concurrent data structure, but since then, we found it is used on Java 8's LongAdder, so we're not the only ones using this technique. If you're interested in the details, then take a look at Striped64.longAccumulate() (which is called from LongAdder), and search for cellsBusy and casCellsBusy().

In the Elected pattern, a single thread is selected to perform a special task. Here is some sample code in Java:

final AtomicBoolean electedGuard = new AtomicBoolean(false);
   
public void functionA() {
    doSomethingThreadSafe();
       
    if (electedGuard.compareAndSet(false, true)) {
        specialTask();
        electedGuard.set(false);
    }       
}
and yes, I know we can use a try/finally block, but forget about that kind of stuff for a moment.If functionA() is called by multiple threads, all those threads will be calling doSomethingThreadSafe(), which hopefully, as the name indicates, is something that is actually thread-safe to call. One of those threads will be able to set electedGuard to true using the CAS operation and therefore, there will be a single thread calling specialTask().

I know what you're thinking:
You're a liar because you said you would stop blabbering about locks, and yet, here is one, staring us right in the face!
Indeed, the Election pattern uses a kind of lock, and the code above could even be replaced with a tryLock() of a mutex.

There is one very very important thing to be aware of though, and that is, that the functionA() is non-blocking!

Let me repeat it, because most people miss it completely: The Election pattern, by itself, is Wait-Free Population Oblivious.

Once again, I know what you're thinking:
Wait, what?!?
How can something have "a lock" and still be non-blocking?
That doesn't make any sense!


Progress Conditions are all about the (maximum) number of operations it takes to complete a task. For example, if you say that the function doSomethingThreadSafe() is lock-free, then what will the progress condition be for functionA() ?
The answer is, lock-free as well, under the assumption that specialTask() is only called from this particular code path, and that it completes when called in isolation.
When a thread calls functionA(), it will execute doSomethingThreadSafe() and then try to CAS the electedGuard from false to true. If it fails the CAS, then its job is done and it will return. If it succeeds in the CAS, then it will be the only thread to call specialTask(), at least until the call to specialTask() returns and it sets electedGuard back to false. Notice that there is only one thread at a time calling specialTask() so, by itself, it doesn't imply a blockage.

However, (and this is a big "however") even though functionA() will have the same progress condition as doSomethingThreadSafe(), it can happen that if a thread blocks/sleeps/dies after setting electedGuard to true, during specialTask(), then no other thread will be able to execute the specialTask() method for the remainder of the life time of the application.
This implies that the Elected pattern is not fault tolerant.

By this time, you may be thinking:
Whoooaaaa, wait a minute!
Are you saying that functionA() is lock-free but not fault tolerant?
I thought that lock-freeness implied some kind of resilience to faults!


For most lock-free and wait-free algorithms, the assumption that some kind of "fault tolerance" is provided, is correct, but this is not a universal guarantee. Some lock-free algorithms may not be tolerant to faults, with one example being this one, the Elected Pattern.


It is not obvious why this is wait-free, and a lot of people get this one wrong. I've seen researchers with years of experience in concurrency say that the Election pattern is "Blocking", which is not correct.
For those of you not convinced, my suggestion is to look at "The Art of Multiprocessor Programming", page 99, and read very carefully the first two lines of the third paragraph, where the definition of wait-free and lock-free are provided. You will notice that nowhere does it mention anything about faults, or fault-tolerance, or crashes. Here's what it says:
(...) A concurrent object implementation is wait-free if each method call finishes in a finite number of steps. A method is lock-free if it guarantees that infinitely often some method call finishes in a finite number of steps. (...)

In the example above, functionA() will complete as soon as doSomethingThreadSafe() completes, regardless of any thread that may have suspended, or blocked, or crashed while "holding the lock" on electedGuard. If doSomethingThreadSafe() is wait-free, then no matter what is going on with the electedGuard or inside specialTask(), the method functionA() will return after executing a finite number of steps, assuming that specialTask() doesn't have any infinite loop or similar bug.


I know most people despise locks (mutual exclusion or rw-locks), but the truth is, that they are  simple and powerful concurrency constructs. No one can deny that locks can be nightmarish to use when you have a lock-based application design, but if you use locks at just the right place, for example, as part of more sophisticated synchronization mechanism or data structure, then the results can be surprising.
Good examples of this are the Double Instance Locking pattern, which is basically just one mutex and two rw-locks "stitched together" to provide lock-free read accesses, and the other example is this one, the Elected Pattern, where with a simple mutex we can have wait-free properties.

I know this doesn't look like much, but on the next post we will show a lock-free linked list that uses the Elected Pattern to solve a complicated problem: the unlinking of removed nodes.

2 comments:

  1. Nice blog! Thanks.

    We have used a similar technic to do some resources initialization:

    final AtomicBoolean electedGuard = new AtomicBoolean(false);

    // this is a resources initialization function
    public void functionA() {
    doSomethingThreadSafe();

    if (electedGuard.compareAndSet(false, true)) {
    specialTask(); // initialize some critical resources
    // no "electedGuard.set(false);"! since we just wanted to initialize it once
    }
    }

    Instead of using a mutual lock, we found it much more elegant :)

    ReplyDelete
    Replies
    1. Yes, good point. If you're just going to call specialTask() once, then you can skip the electedGuard.set(false)

      Delete