The program for CppCon 2016 is out and there are some really interesting talks coming up.
On the topic of Concurrency, I'm looking forward to these three talks:
- JF Bastien on "No Sane Compiler would optimize atomics"
- Hans Boehm on "Using weakly ordered C++ atomics correctly"
- Paul McKenney on "A lock-free concurrency toolkit for deferred reclamation and optimistic speculation"
JF
Bastien has some really good insights into optimizing code with atomics
and the different memory orders, so I really want to see what he's
going to say this year.
If you don't know who he is, then take a look at his interview from last year's CppCon or from CppCast:
http://concurrencyfreaks.blogspot.pt/2015/11/interviews-for-channel9.html
http://cppcast.com/2015/07/jf-bastien/
and a preview of what his presentation will look like:
https://github.com/boostcon/cppnow_presentations_2016/blob/master/03_friday/no_sane_compiler_would_optimize_atomics.pdf
and
if you want to see some funny comments then check the last 10 minutes
of this talk where JF Bastien gives some comments to the speaker:
https://www.youtube.com/watch?v=k12BJGSc2Nc
As
for Hans Boehm, well, if you're a regular reader from this blog I
shouldn't need to introduce him, but in case you just landed here by
accident, then he's one of the contributors to the C++ Memory Model...
and the Java Memory Model.
This year he's at CppCon and he's going
to talk about common mistakes with non seq-cst atomics, so I definitely
want to see his talk.
Here is some of the content he's going to go over:
https://docs.google.com/presentation/d/1Eapu4G6QcetO9mOeZSQJAuvxB8kK3pHQwxWbHNOLk8w/pub?start=false&loop=false&delayms=3000#slide=id.p
Funny
thing, I was considering giving a talk this year precisely about this
topic but I'm not able to go to CppCon so I didn't even submit the
proposal. Obviously, I wouldn't expect my talk to be as good as his is
going to be, I was thinking of focusing more on examples and what kind
of optimizations are safe to do when writing an algorithm using C++
atomics.
... ohh well, maybe I can still propose it next year as a more "hands-on-approach".
And
finally, Paul McKenney is going to to give two talks, with the first
one being about the Issaquah challenge (which he talked about a couple
years ago so I hope he's going to go a bit more in depth this time)
https://www.youtube.com/watch?v=1Q-RH2tiyt0
and the second talk is going to be about safe memory reclamation in concurrency.
It
looks like Paul, Michael Wong and Maged Michael are trying to add
techniques for safe memory reclamation to the C++ standard, namely,
adding RCU and Hazard Pointers.
For those of you who don't know
what this means, let me just say that most experts in Concurrency
(shared memory concurrency) say that Memory Reclamation is the toughest
problem in Concurrency. This includes people like Maurice Herlihy, Erez
Petrank, and many others.
If Memory Reclamation is the toughest
problem in Concurrency, and Concurrency is the toughest way of
programming, then Memory Reclamation must be the hardest thing you can
do in Programming.
There are three main ways to do
memory reclamation in concurrent (non-blocking) applications: Atomic
Reference Counters, RCU, and Hazard Pointers (HP).
They have different properties, like different progress conditions for Readers and Reclaimers. By "Readers" I mean the threads calling methods that don't delete anything in the data structure, and "Reclaimers" are the ones doing the deletions.
Atomic Reference Counters are already in the next standard and should be part of C++17.
Herb
Sutter was the one that proposed them and I'm glad he did, but when it
comes to concurrent data structures, it is pretty much useless because
it is only non blocking in certain situations, and it is the slowest of
the three methods for Readers because when you're traversing a list with
nodes you need to do two atomic_fetch_add() per node that is traversed.
If you want to understand what situations are those, then take a look at chapter 9.1 of Paul Mckenney's book in this link https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook-e1.pdf
RCU is
better known for its implementation in the Linux Kernel, but there are
many userspace implementations of RCU. We used some of them in the
Left-Right technique.
RCU is wait-free (population oblivious) and
super fast for Readers, but it is blocking for Reclaimers which means
that if you want high throughput or low latency even when doing removal
of nodes/objects from a data structure, this is isn't going to give you
what you want.
Even with delegation, where the last reader will
delete the node/object, it doesn't change the progress condition for the
Reclaimers, and it even makes the Readers worse off because now they
may have to delete an unbounded number of objects, thus making the
Readers wait-free unbounded, which has much worse tail latency.
RCU
is the simplest to use of the three techniques because its semantics
for the Readers are similar to those of a Reader-Writer Lock.
Hazard Pointers are typically lock-free for the Readers (wait-free in some situations) and wait-free bounded for the Reclaimers.
The
main drawback is that they're slow for Readers because we have to do
one sequentially consistent store for every node that is traversed.
It's
not as slow as Atomic Reference Counter is for Readers, but it is
slower than RCU which pretty much has no synchronization apart from the
initial call to rcu_read_lock() and the final call to rcu_read_unlock().
Of
the three techniques, this is the hardest to use, not because it is
hard per-say, but because it requires a very deep knowledge of the
algorithm where you're applying hazard pointers to. If you want to make it wait-free
then the difficulty goes up a notch, and it typically requires the
algorithm to be designed with HPs in mind.
One of the untold
advantages of HPs is that they are memory bounded. This means that at
any given time you can not have more than (MAX_THREADS x MAX_HPS) nodes/objects in the retired lists waiting to be deleted (for an 'R'
factor of zero). RCU has no such bound and if you have memory
constraints this can become problematic and it can impact your tail
latency for the Reclaimers.
There are many other
methods for safe memory reclamation, but they are either variations on
these ideas or they rely on some architecture/OS specific functionality,
like needing Posix signals, or some CPU instructions that only exist on
some hardware. Those are "outside" of the C++ Memory Model, i.e. you
can't implement them using only atomics and the Memory Model and
therefore, their implementations are not portable.
Paul McKenney is one of the authors of RCU.
Hazard Pointers are the brain-child of Maged Michael.
Obviously,
there is a bit of friendly rivalry between the authors of the two best
methods for non-blocking memory reclamation :)
Having the two of
them together in this endeavor to provide safe memory reclamation in
C++, is like having Professor Xavier and Magneto joining forces to bring
down Apocalypse :)
Yes, it's really _that_ cool!
Sure,
I'm a fan boy and I'm exaggerating. Just like Andreia was saying the
other day, you can implement yourself Hazard Pointers and RCU,
efficiently, in C++, and you don't really need library support for that,
and in fact _we_ have done it.
However, I'm not going to say it was easy (it wasn't, specially the hazard pointers stuff) and most people just want
to use a pre-existing library/API which they know is correct and fast
for most usages, and that's why having Paul, Maged Michael, and Michael
Wong behind this is a great thing for C++ and its community.
If you care about Concurrency and if you're at CppCon this year, make sure to attend this talk, it should be epic!
No comments:
Post a Comment