Monday, November 10, 2014

Relaxed atomics, out-of-thin-air results, and ConcurrentLinkedQueueRelaxed

On today's post we have a quiz on Relaxed Atomics.

Before you go any further, I recommend reading this paper by Hans Boehm and Brian Demsky entitled "Outlawing Ghosts: Avoiding Out-of-Thin-Air Results", if you haven't done so already. Without reading this paper, you may be missing some of the subtleties of relaxed atomics.

Done already?
Ok, let's start.

Suppose we have three slightly different programs named Program A, Program B, and Program C, each with two threads. The code that each thread executes for each program is described below, and I'll use Java notation, but this is also true for C++1x, C11, D, and for Scala (not for Go, because it doesn't have relaxed atomics from what I could tell):

Program A

volatile int x = 0;
volatile int y = 0;

Thread 1:
if (x == 1) y = 1;

Thread 2:
if (y == 1) x = 1;

Program B

int x = 0;
int y = 0;

Thread 1:
if (x == 1) y = 1;

Thread 2:
if (y == 1) x = 1;

Program C

int x = 0;
int y = 0;

Thread 1:
if (x == 42) y = 1;

Thread 2:
if (y == 1) x = 1;

Question 1:
What are the possible outcomes of each of these programs?

Question 2:
Which (if any) of these three programs generate sequentially consistent results ?

I've taken the liberty of emphasizing the differences between the three programs in bold, so as to make it easier to analyze the code.
Program A is the only one where the variables x and y are sequentially consistent atomics. Programs B and C use relaxed atomics for all their operations (in the JVM, all loads and stores on an int are guaranteed atomicity). Program C compares x with 42 instead of comparing to 1.

If these programs seems somewhat familiar it is because Program B is one of the examples on the paper (see section 2) indicated as recommended reading at the beginning of this post.

So have you thought about it for a while?
What are the possible outcomes for each one of these three programs?
Scroll down to look at the solutions.

a little bit further....

For Program A, the only possible outcome is {x=0, y=0}.
There is no other possible outcome, and I hope this is obvious. If it is not, then I recommend watching Herb Sutter's presentation on atomics because I don't even know how to explain it. But basically, no re-ordering is possible due to implicit acquire barriers on the loads, and release barriers on the stores.
As for the consistency model, Program A provides at least sequential consistency because, well, it's constant/immutable, in the sense that there is only on possible state and it never changes.

For Program B, there are two possible outcomes {x=0, y=0} or {x=1,y=1}. To understand why the {1,1} state is possible, the best is to read the paper by Hans Boehm.
There is no sequentially consistent history for Program B that can result in {x=1,y=1}, so Program B is not sequentially consistent.

For Program C, there is only one possible outcome which is {x=0, y=0}, mainly because we check against a value that x can never take (a value of 42). Unlike for Program B, in Program C there is no way branch prediction can predict 42 and then it actually becomes 42 because there is no code path for which x is set to 42.
Program C has only one state and it's constant/immutable, so yes, it is sequentially consistent (if it even makes sense to say so).

So why should we care about this?

The reason for spending time thinking about these scenarios is related to why the optimization on item in the ConcurrentLinkedQueueRelaxed can be done (mentioned on the previous post). Whenever we read the variable item with a relaxed load, one of three things may happen:
- The value of item is seen a being the item we're searching for: It could happen that it has been removed in the mean time (CAS'ed to null), so we have to re-load using an acquire barrier (i.e. volatile load);
- The value of item is seen as being a non-null value but not the item we're searching for: It may actually be at null by now, but we don't care about that item because we're looking for a different item;
- The value of item is seen as being null: It may be null but it may also (in very special circumstances) be the item we're looking for, and because of this, we need to re-load with an acquire barrier (volatile load);

Here's a diagram that explains what are the possible observable states for item depending on the possible "real" state of item: