Before we close up on the
Tidex Lock, there is one extra technique that may be useful.
When we use
pthread_self() to get the thread id, it should be noticed that using a
pthread_t for anything else than comparison is
undefined behavior. In practice, this is not much of a problem because we just want to use the negative value of it, so even for the cases where
pthread_t is a pointer to a struct, casting it to a
long long and using the negative of it should provide a unique value that is never returned by
pthread_self().
It is true that most of us get a bit stressed when the words
undefined behavior are part of a mutual exclusion lock algorithm, therefore, in order to give some peace of mind to those troubled by it, here is a technique which doesn't need
ptread_self() and provides the same performance.
We call it the
No
Pthread_
Self() variant, or
Tidex NPS for short.
We start by adding a sequentially consistent atomic variable named
globalThreadIndex which is incremented monotonically with
atomic_fetch_add() by each new thread accessing an instance of the lock. This increment is done once and only once for each thread and it is the equivalent to the
thread id. To store this thread id for later usage, we have a thread-local variable (non-atomic) to save it.
Here's what the extra code looks like in C11:
/*
* This variable can even be an 'atomic_short'
because it is unlikely that your
* application will create more than 32767
threads. This also means that
* both ingress and egress can be of type
'atomic_short', which can save memory.
*
* We start at '1' because we want to use the
negative of the value as well.
* Alternatively, we could start at zero but
then we would have to advance
* this index 2 at a time.
*
* This is shared by all tidex_nps_mutex_t
instances to save memory.
*/
static atomic_long globalThreadIndex =
ATOMIC_VAR_INIT(1);
/*
* The index of the thread is stored in a
thread-local variable that is
* shared by all instances of
tidex_nps_mutex_t.
* If the value is the initialized of
INVALID_TID (zero) then we need to
* get a value from globalThreadIndex using
atomic_fetch_add(), once and
* only once per thread.
*/
static _Thread_local long tlThreadIndex = INVALID_TID;
Notice that both these variables are shared among all instances of the locks so there is no increase in memory usage, and in fact, with this technique we can make the
ingress and
egress be 32bit integers or event 16 bit integers, which would reduce memory usage.
The modifications to
lock() are minimal, and
unlock() remains unchanged:
void
tidex_nps_mutex_lock(tidex_nps_mutex_t * self)
{
long mytid = tlThreadIndex;
if (mytid == INVALID_TID) {
tlThreadIndex =
atomic_fetch_add(&globalThreadIndex, 1);
mytid = tlThreadIndex;
}
if (atomic_load_explicit(&self->egress, memory_order_relaxed)
== mytid) mytid = -mytid;
long prevtid = atomic_exchange(&self->ingress, mytid);
while (atomic_load(&self->egress) != prevtid) {
sched_yield(); // Replace this
with thrd_yield() if you use <threads.h>
}
// Lock has been
acquired
self->nextEgress = mytid;
}
Conceptually, there is not much of a difference between this approach and the one using
pthread_self() because
pthread_self() returns itself a value from a thread-local variable which contains the thread's id.
The performance numbers remain unchanged, at least on our usual benchmark.
As usual, the C11 code is available on github:
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/C11/locks/tidex_nps_mutex.h
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/C11/locks/tidex_nps_mutex.c