Comments on Concurrency Freaks: Implementing a Reader-Writer Lock using RCU

Dope... you're absolutely right about needing ...

2015-10-26T10:46:26.949+01:00

Dope... you're absolutely right about needing memory_order_release on exclusiveLock(). Without the release, it could re-odered into the critical section.
Good catch!

Regarding the PS, yes you're right, same logic. It needs a memory_order_acquire.

The deadlock, yes I get it, you're talking about the kernel implementation.
If it's a userspace implementation of RCU like the one we have inside the Classic Left-Right technique (or your own implementation?), then the deadlock issue doesn't occur. And I think SRCU doesn't suffer from this issue either because we can have more than one of those... I'll have to dig into it a bit more to be certain.
Even for the kernel implementation, I think that synchronize_rcu() has a mechanism to detect whether rcu_read_lock() has been called from the same thread (I'm not certain this is true, and certainly not for all implementations), and anyways, it would still give inconsistent results if more than one rw-lock instance is used.

Great code review, thanks David!

Hi Pedro, Regarding memory order: The issue is, y...

2015-10-26T06:06:57.115+01:00

Hi Pedro,

Regarding memory order:
The issue is, you need something like a release fence *before* the writerWaiting store in exclusiveUnlock. An example reordering that could cause a problem is if the writerWaiting store moves up inside the critical section (say, to a point when half the protected data has been updated but not the other half): then a call to sharedLock() might finish while a writer is still making modifications to protected data. (I believe that a similar analysis will show the optimization you mention in your PS to be unsafe; the acquire fence has to be *after* the relevant load, not before it. I'm not sure enough about the ordering semantics of (U)RCU to speak to the PPS.).

Regarding deadlock:
Consider:
void t1() {
rwlock1.sharedLock();
A:
rwlock2.exclusiveLock();
}

void t2() {
rwlock3.sharedLock();
B:
rwlock4.exclusiveLock();
}

Suppose thread t1 runs to label A and then thread t2 runs to label B. At that point, neither of the threads can successfully enter their exclusive lock critical sections, since they will be stuck in a synchronize_rcu() call that will never return, since it's blocked by the other thread's presence in a rcu read-side critical section. (In fact, either thread by itself will deadlock; but this example works even if synchronize_rcu is modified to not require a quiescent state from the calling thread to occur).

Hi David, Not a typo. We don't need to use me...

2015-10-25T23:37:44.133+01:00

Hi David,

Not a typo. We don't need to use memory_order_release on exclusiveUnlock() because the writersMutex.unlock() has release semantics and will guarantee visibility of the change to writerWaiting.
Theoretically, there is no "synchronizes with" of writersMutex.unlock() with the writerWaiting.load() in sharedLock(), but in practice there must be some fence in the unlock() method, and even if there wasn't, then at most we could have some Readers waiting "too long" on sharedLock(), but they would eventually see writerWaiting changing to false.

I don't understand the second comment. Can you go into more details?
Is it because synchronize_rcu() can not be called from multiple threads on kernel space?
Notice that the call to synchronize_rcu() is protected by the std::mutex, so I don't see any possibility of deadlock in userspace.

PS: To optimize even further, in sharedLock(), the second load on writerWaiting() can be std::memory_order_relaxed. This is possible because rcu_read_lock() has "acquire semantics".

PPS: ... and in exclusiveLock() we can replace the seq-cst store with a relaxed store:
writerWaiting.store(true, std::memory_order_relaxed)
;)

Minor Typo: you need at least std::memory_order_re...

2015-10-25T23:10:32.792+01:00

Minor Typo: you need at least std::memory_order_release in exclusiveUnlock.

Also worth noting is that this can deadlock if multiple of these locks are used; only one RW is possible per RCU domain.