While designing locks, whether they are mutual exclusion or reader-writer locks, one of the questions that often comes up is "How efficient can a lock be?".
When that lock is implemented on a language with acquire/release semantics like C11 or C++1x, then part of that question is related to "What is the minimum number of barriers required to implement a lock?" (or "fences" if you prefer). I came to a partial answer on this one, which I'm going to describe below.
Obviously, we weren't the first ones to think of this, and I'm pretty confident that experts in this field have reached the same (or more advanced) conclusions a very long time ago. People like Leslie Lamport, Maurice Herlihy, Nir Shavit, Michael Scott, David Dice, have been thinking about these issues for a long time. Perhaps they never thought it was too important, or they didn't bother to talk about it, or maybe they mentioned it briefly in one of their papers and we just missed it.
It could also happen that, until recently, this question wasn't all that relevant, the reason being that only recently did programming languages adopted a memory model with explicit acquire/release semantics.
Nowadays, with C++11/C++14 and C11, the memory model defines the synchronization operations in terms of acquire and release barriers, so it becomes an important question to design a lock that does the least amount of synchronization as possible, and for that, we must first figure out what is the minimum.
Keep in mind, that having an algorithm and implementation that provides the minimum number of barriers is just one of the effects contributing to a lock's performance. We can even argue that it is a small effect, when compared to other things, like false-sharing, contention, cache-misses, etc. Still, I think it is a question worth posing, and answering!
This answer I'm proposing is by no means rigorous, and I hope that in the future someone will give a more rigorous proof using Information Theory or something like it, but for the moment, a simple logical narrative should suffice:
Suppose you have an object L that is the lock, whose state may be LOCKED or UNLOCKED. The object L has two functions: lock() and unlock().
For a given thread to determine the current state of L, it must do an acquire barrier. For the same thread to get hold of the lock, it must change the state of L from UNLOCKED to LOCKED which means doing at least a write with a release barrier.
The actual algorithm that is used can be much more complicated, and in fact can even be reversed: first set the state to LOCKED with a release barrier, and then check if there are other threads doing the same, which needs an acquire barrier. It can also be done with a single atomic operation, for example when doing a spinlock with a Compare-And-Swap (CAS) or Exchange, both of which have implicit acquire and release barriers.
For the thread to release the lock, the minimum it needs to do is to inform the other threads/cores of the change of state from LOCKED to UNLOCKED, and that means doing at least one release barrier.
There are two exceptions to this rule however:
The first concerns "Recursive Locks" (or "Reentrant" if you prefer). On a recursive lock, the minimum number of barriers is zero. When a thread attempts to acquire a lock that already belongs to the current thread, it can have a specific state in L (a thread-local variable for example) that it checks. Because it was this same thread that wrote to it before, either to set it to LOCKED or UNLOCKED, there is no need to do an acquire or release barrier. Same thing for the unlock() function, a regular write without any barriers is enough, as long as this is not the last unlock() of L by this thread.
The second is for optimistic locks. An example of an optimistic lock is Java's StampedLock tryOptimisticLock(). Notice that optimistic locks are only possible for read-only operations, so they are used in the context of Reader-Writer Locks and I've never seen them being used with mutexes (don't know if there is a use-case for mutexes?).
An optimistic lock needs three barriers: one acquire barrier in the lock() function, and another acquire barrier in the unlock() to check if the state of the lock has changed during the (read-only) operation, followed by a release barrier also in the unlock(), to prevent code from escaping outside the critical section. The first acquire barrier is used to read the current state of the lock, i.e., if it is currently being held by a Writer, and the second acquire barrier to check if there was one or more Writers acquiring the lock for the duration of the Reader's operation.
In summary, for pessimistic non-recursive locks, the minimum amount of barriers needed for the lock() function, is 1 acquire and 1 release, while for the unlock() function is 1 release.
Examples of locks that have this minimum property are the spinlock with a CAS, or the Ticket Lock we described on a previous post, but there are many others.