PS: It seems this algorithm is just the CLH lock with the previous node removed, and nothing original at all... dope.
Many thanks to David for pointing it out ;)
So here is an implementation of the CLH lock in C11:
Compared to the MCS lock, in the CLH lock there is no physical queue, but there is a kind of logical queue. Each thread knows only two nodes: the node it created, and the node previous to it.
http://www.cs.rice.edu/~vs3/comp422/lecture-notes/comp422-lec19-s08-v1.pdf
(slide 45)
Just like the MCS, the CLH is also: "starvation-free"; it is easy to transform into a recursive mutex if needed; and each thread waits/spins on its own cache line which will be written by the previous thread, namely, the succ_must_wait variable of the previous node .
This implementation of the CLH is memory safe, in the sense that all created nodes are safely and correctly deleted at some point in the execution, i.e. it manages its own memory and does not need a GC.
The main different from the MCS lock is that MCS may have to do an Compare-And-Swap in addition to the atomic_store() in the unlock() function.
Let's look at the member variables:
typedef struct clh_mutex_node_ clh_mutex_node_t;
struct clh_mutex_node_
{
_Atomic char succ_must_wait;
};
typedef struct
{
clh_mutex_node_t * mynode;
char padding[64]; // To avoid false sharing with the tail
_Atomic (clh_mutex_node_t *) tail;
} clh_mutex_t;
Now, lets look at the code:
void clh_mutex_init(clh_mutex_t * self)
{
// We create the first sentinel node unlocked, with succ_must_wait=0
clh_mutex_node_t * node = clh_mutex_create_node(0);
self->prev = NULL;
self->mynode = node;
atomic_store(&self->tail, node);
}
void clh_mutex_lock(clh_mutex_t * self)
{
// Create the new node locked by default, setting succ_must_wait=1
clh_mutex_node_t *mynode = clh_mutex_create_node(1);
clh_mutex_node_t *prev = atomic_exchange(&self->tail, mynode);
// This thread's node is now in the queue, so wait until it is its turn
char prev_islocked = atomic_load_explicit(&prev->succ_must_wait, memory_order_relaxed);
if (prev_islocked) {
while (prev_islocked) {
sched_yield();
prev_islocked = atomic_load(&prev->succ_must_wait);
}
}
// This thread has acquired the lock on the mutex
free(prev);
self->mynode = mynode;
}
Then, we have to store mynode so that the unlock() function knows what is the mynode for the current thread. This could be done with one thread-local, but we can also store it in a member field (with the same name) because we know that between now and the call to unlock() there will be no other thread modifying this variable, except if some other thread is calling unlock() without first calling lock(), but that is a bug in the application and it has undefined behavior.
We can now safely free the memory used by prev with the guarantee that there will be no other thread that can access it. The previous thread no longer has a pointer to that node because the last time it used it was to do the atomic_store(), which allowed the current thread to acquire the lock, and no other thread has a reference to that node.
Notice that each thread is responsible for free()ing the memory of the previous node, not the node that it created.
void clh_mutex_unlock(clh_mutex_t * self)
{
atomic_store(&self->mynode->succ_must_wait, 0);
}
All that this does is to set the islocked to zero so that the next thread can acquire the lock.
Here are some animations that may help understand how the algorithm works:
A nice property about the CLH mutex is that it uses the (theoretically) minimum amount of synchronization operations to perform a mutual exclusion lock:
- In the lock() function it has a single synchronized operation atomic_exchange() with one acquire and one release barrier.
- In the unlock() function is has a single synchronized operation, the atomic_store() with a release barrier.
C11 source code on github, along with a powerpoint containing some animations:
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/C11/locks/clh_mutex.h
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/C11/locks/clh_mutex.c
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Presentations/CLH-Mutex.pptx
You can also check a Java implementation of the original CLH algorithm in "The Art of multiprocessor Programming" section 7.5.2