Monday, September 8, 2014

Not all race conditions are bugs

There is a very good talk by Hans Boehm about race conditions and the C++1x (and C11) memory model that we talked about before, but I'm posting the link again:
https://www.youtube.com/watch?v=NXtpU8jcmaQ

One of the slides in this presentation has a quote from the C++ standards at the timestamp 4:30 and it looks like this:

I know it's not very easy to read it, but the point here is that data races are undefined behavior (nothing new about it), but if there are simultaneous read/write by two threads on the same variable and it is an atomic variable, then this is no longer called a data race, and it becomes well defined behavior in the C11 and C++11 memory model.


There is an interesting post about benign data races by Dmitry Vyukov, which for those of you don't know is the guy behind 1024cores.net
https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
The post doesn't seem to invite you do do data races, even if they have well defined behavior, and I agree in general, but like all engineering tools, there is a purpose for its existence: there are use cases for it.


There was a bit more I wanted to say about this, but for the moment I'll just stick to the basics:

if you define a data race as being a race on a variable, regardless of whether or not that variable is of type atomic, then there are benign data races, but this is not how the C11/C++11 defines a data race, so it depends on how your phrase it.
This means that you can have races on a variable and still have well defined behavior, with relaxed atomics being the best example.

Another thing that is sometimes confusing, is that a race condition (with a well defined behavior) is not necessarily a bug. In fact, I've seen valid (yet rare) use cases for such scenarios.
This has an interesting consequence: Even if you were to write an application that can statically or at run-time check for race conditions, and even if that application is able to find all data races on a given program (which is itself an NP problem), even then, you would not be able to use this reliably to find bugs in the code. There would be false-positives for programs deliberately doing races.
This is why I frown my nose a bit when people talk about applications that find race conditions as if it's the greatest thing since bread came sliced.
Don't get me wrong, I've used such applications to find bugs, but it's more of a best-effort or a debugging tool. For example, once you suspect a bug exists, and you want to see all of the race conditions to check if there is one that could be causing the bug, then it becomes an invaluable engineering tool.


Just keep in mind that although most race conditions are bugs, not all race conditions are bugs.

No comments:

Post a Comment