Sunday, November 8, 2015

A good resource for programming with atomics

One common question the I've heard is: Where can I find a good resource for learning how to use C++ atomics?

There is a good resource, it's this page: http://en.cppreference.com/w/cpp/atomic/memory_order
and if you don't like reading, you can watch the Atomic Weapons presentation by Herb Sutter
http://concurrencyfreaks.blogspot.com/2013/02/the-new-memory-model-in-c11-and-c11.html

But that's not the answer most people expect.
This happens for two reasons:

First, because the memory order page is "too technical". It needs more examples, particularly of how the different memory orders interact with each other. For example, from reading that page alone, it can be very hard to understand what happens when you have a store with a memory_order_release followed by a load with a memory_order_acquire  (I talk about this in my interview with Channel9), and if you throw consume into the mix it gets even trickier.
However, the information is (almost) all there, it's just a matter of knowing how to read it, which I agree, can sometimes be a daunting task.

Second, and this is the big one, expectations!
When someone asks about more information on atomics, what they really want to know is more information on concurrent/lock-free/wait-free data structures and algorithms. And the reason why they want a lock-free/wait-free data structure is because they want it to be fast/scalable/low-latency in a multi-threaded application.
This has the further implication that once they have an implementation of the concurrent data structure, they won't be happy about it because it will be "slow", because it is using memory_order_seq_cst, so they'll want to know how to implement the same data structure using one of the other (faster) memory orders.
Obviously, before implementing or understanding a lock-free/wait-free data structure, we need to have a good understanding of atomics. But what they really want to know is how to implement a lock-free data structure, and this, unfortunately is extremely tricky and requires a very long time.
For those of you that studied CS (I didn't, but I attended some classes), remember those classes about (single-threaded) data structures? How many semesters did you study data structures?
Well, if they had lock-free/wait-free data structures in CS classes at college, it should take a lot longer, because concurrent data structures have not only the dificulties of single-threaded data structures, but many more subtle details, which require some time to grasp.
If I had an euro for every time I found a bug in a concurrent data structure that was designed by someone who thought it was a good idea to design its own data structure, I wouldn't be a High Net Worth Individual , but I would have enough for a weekend escapade at some semi-tropical place  :)
Para-quoting Fedor Pikusdesigning concurrent data structures is hard... the only thing harder, is to design correct concurrent data structures.

So yes, people want to understand how to make their own data structures, but it's such a hard task. It requires a lot of testing, a lot of careful reviewing, perhaps even some praying to pagan deities and a few animal sacrifices... and in the end, you may not be sure that it's correct, or when the next developer comes and (innocently) modifies a line to "add a feature" or "fix a bug", the result will be an incorrect algorithm.
Personally, I think that the near future will bring tools that are capable of doing static analysis on the concurrent code, to show if it breaks mutual exclusivity or some invariant that you define. We're not there yet though, and for the moment, if we want to "prove" that a certain algorithm is correct, we must do so in the form of a paper, with some "mathematicaleze" language. Even then, the "proof" itself might be wrong, and all is for naught.

If you want to learn about atomics and the memory model, my recommendation is the following:
Go to the web page at the beginning of this post and read it carefully, then, find a concurrent data structure that you actually need, or you find interesting for some particular reason. If this data structure is not already implemented in C++ atomics, try to do so yourself, using only memory_order_seq_cst.
Once that is working, write as many stress test scenarios as you can think of, until you are as confident as possible about the correctness (you'll never be 100%). Try to get another pair of eyes to review your code.
Then, and only then, try to think very very very carefully about which of the atomics can be changed to memory_order_acquire and memory_order_release. Repeat the stress tests, if possible, on different architectures, the more "relaxed", the better.
Then, think about whether it's possible to change some of those atomics to memory_order_consume or even memory_order_relaxed, and again, repeat the tests and code reviews and any other rituals (pagan or not) that you may think will help.
You'll have questions while doing this, at that time, you can go and re-read the cppreference page on memory order again, or even the C++ draft. You can also take a look at some of our examples (at your own risk) where we do optimizations, but keep in mind that these are specific to the algorithm. 

You'll need to first understand the algorithm you're trying to implement and its invariants, and only then you can think about how to relax some atomics to get better performance.
That's all the advice I've got, I know it's not much, but it's better than nothin'


No comments:

Post a Comment