Tuesday, May 27, 2014

CLH Mutex - An alternative to the MCS Lock


PS: It seems this algorithm is just the CLH lock with the previous node removed, and nothing original at all... dope.
Many thanks to David for pointing it out  ;)

So here is an implementation of the CLH lock in C11:


Compared to the MCS lock, in the CLH lock there is no physical queue, but there is a kind of logical queue. Each thread knows only two nodes: the node it created, and the node previous to it.
http://www.cs.rice.edu/~vs3/comp422/lecture-notes/comp422-lec19-s08-v1.pdf 
(slide 45)

Just like the MCS, the CLH is also: "starvation-free"; it is easy to transform into a recursive mutex if needed; and each thread waits/spins on its own cache line which will be written by the previous thread, namely, the succ_must_wait variable of the previous node .
This implementation of the CLH is memory safe, in the sense that all created nodes are safely and correctly deleted at some point in the execution, i.e. it manages its own memory and does not need a GC.
The main different from the MCS lock is that MCS may have to do an Compare-And-Swap in addition to the atomic_store() in the unlock() function.

Let's look at the member variables:


typedef struct clh_mutex_node_ clh_mutex_node_t;

struct clh_mutex_node_
{
    _Atomic char succ_must_wait;
};

typedef struct
{
    clh_mutex_node_t * mynode;
    char padding[64];  // To avoid false sharing with the tail
    _Atomic (clh_mutex_node_t *) tail;
} clh_mutex_t;
Each node contains a single variable, succ_must_wait. When succ_must_wait is zero, it means that the thread that created the next node can now enter and acquire the lock on the mutex. If succ_must_wait is 1, then that thread must spin/wait/yield. The only atomic variable is tail. The mynode variable is only used as a way to pass the pointer from the lock() function to the unlock() and could be discarded if we decided to return it with pass by reference in lock() and then pass them to unlock(), but that would lose some generalizability.

Now, lets look at the code:

void clh_mutex_init(clh_mutex_t * self)
{
    // We create the first sentinel node unlocked, with succ_must_wait=0
    clh_mutex_node_t * node = clh_mutex_create_node(0);
    self->prev = NULL;
    self->mynode = node;
    atomic_store(&self->tail, node);
}
The sole purpose of the initialization function is to create a sentinel node with succ_must_wait unlocked (a value of zero).


void clh_mutex_lock(clh_mutex_t * self)
{
    // Create the new node locked by default, setting succ_must_wait=1
    clh_mutex_node_t *mynode = clh_mutex_create_node(1);
    clh_mutex_node_t *prev = atomic_exchange(&self->tail, mynode);

    // This thread's node is now in the queue, so wait until it is its turn
    char prev_islocked = atomic_load_explicit(&prev->succ_must_wait, memory_order_relaxed);
    if (prev_islocked) {
        while (prev_islocked) {
            sched_yield(); 
            prev_islocked = atomic_load(&prev->succ_must_wait);
        }
    }
    // This thread has acquired the lock on the mutex

    free(prev);
    self->mynode = mynode;
}
The lock() function starts by create a new node, mynode, and when it applies it on the tail using an atomic_exchange() it will return the previous node, prev. If the prev->succ_must_wait is 0 then it means that this thread has acquired the lock, otherwise it must spin/yield/wait until succ_must_wait is zero.
Then, we have to store mynode so that the unlock() function knows what is the mynode for the current thread. This could be done with one thread-local, but we can also store it in a member field (with the same name) because we know that between now and the call to unlock() there will be no other thread modifying this variable, except if some other thread is calling unlock() without first calling lock(), but that is a bug in the application and it has undefined behavior.
We can now safely free the memory used by prev with the guarantee that there will be no other thread that can access it. The previous thread no longer has a pointer to that node because the last time it used it was to do the atomic_store(), which allowed the current thread to acquire the lock, and no other thread has a reference to that node.
Notice that each thread is responsible for free()ing the memory of the previous node, not the node that it created.



void clh_mutex_unlock(clh_mutex_t * self)
{
    atomic_store(&self->mynode->succ_must_wait, 0);
}
In the unlock() function, notice that the value stored in self->mynode was written by the same thread, so it doesn't need to be atomic or have acquire/release barriers.
All that this does is to set the islocked to zero so that the next thread can acquire the lock.

Here are some animations that may help understand how the algorithm works:



A nice property about the CLH mutex is that it uses the (theoretically) minimum amount of synchronization operations to perform a mutual exclusion lock:
- In the lock() function it has a single synchronized operation atomic_exchange() with one acquire and one release barrier.
- In the unlock() function is has a single synchronized operation, the atomic_store() with a release barrier.


C11 source code on github, along with a powerpoint containing some animations:
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/C11/locks/clh_mutex.h
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/C11/locks/clh_mutex.c
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Presentations/CLH-Mutex.pptx

You can also check a Java implementation of the original CLH algorithm in "The Art of multiprocessor Programming" section 7.5.2

1 comment:

  1. This is basically the CLH lock except that you free the previous node instead of reusing it.

    ReplyDelete