Wednesday, July 16, 2014

Lock-Free definitions galore

We've talked before about the definition of Lock-Free, but there is a lot of confusion around its definition and the other definitions of Progress Conditions, so lets take a closer look at it.

If you google a bit for the definition of Lock-Free, you'll see that there are several:
"An algorithm is lock-free if it satisfies that when the program threads are run sufficiently long at least one of the threads makes progress (for some sensible definition of progress)."
"Lock-freedom means that a system as a whole moves forward regardless of anything. Forward progress for each individual thread is not guaranteed (that is, individual threads can starve)."

CS Course at Carnegie Mellon:
"On a system using a lock-free algorithm, it is guaranteed that some thread is making progress. This means that the overall system is making progress toward the end goal. However, there still might be starved threads, as lock-free algorithms only guarantee that at least one thread is making progress at any given time."

Blog posts:
"An algorithm is lock-free if at all times at least one thread is guaranteed to be making progress."

The Art of Multiprocessor Programming, page 99:
"A method is lock-free if it guarantees that infinitely often some method call finishes in a finite number of steps."

The Wikipedia definition is the easiest to understand, unfortunately, it is also the most ambiguous because it relies in the definition of "progress", whatever that means. I like it as a way to explain the concept of lock-free to someone that is new to the field, so it is appropriate as a wikipedia page, but too "fluffy" to be taken seriously.

The following three definitions are very similar to each other. They also rely on the ambiguous "progress", but at least the second and third ones explicitly mention something I find very important: that you can have starvation in a lock-free algorithm (or method).

The definition from "The Art of Multiprocessor Programming" by Maurice Herlihy and Nir Shavit is the best definition. It may not seem so at first, but I'm pretty sure that a lot of work and long hours of discussion went into this one sentence, and the end result is this short statement that is able to synthesize such a powerful concept. It has an almost haiku beauty.  :)

First, the definition by Herlihy is done in terms of number of steps, which is a concept better defined than progress, after all, Lock-Free is a type of Progress Condition, so in a certain sense, we could argue that on the first four definitions we are trying to use the concept of progress to define progress conditions, which sounds unproductive.

Second, it talks about methods and not algorithms, which is appropriate because as we have seen before, an algorithm can have methods with different progress conditions. For example:
- A data structure using the Double Instance Locking technique is Blocking for write-modify operations and Lock-Free for read-only operations;
- A data structure using the Left-Right Pattern is Blocking for write-modify operations and Wait-Free Population Oblivious for read-only operations (when using a WFPO readIndicator);
- Harris' linked-list can be made Wait-Free Bounded (by the size of the key) for lookups, and Lock-Free for insertions and removals.

Third, the part about "infinitely often" is very important because it covers non-deterministic scenarios.
Imagine the case where multiple threads attempt an operation and they all fail, but this failure is non-deterministic, with a probability close to 1, that at least one of the threads will succeed. This means that all threads may fail for a finite number of consecutive attempts, but eventually one of them will succeed.
Under the other definitions, this scenario would not be Lock-Free, while under the definition by Herlihy it would be Lock-Free. Which one is best can be disputed, but if you use one of the other definitions, then what will the scenario I've described be? Blocking? Something else?
The different definitions of Blocking, Obstruction-Free, Lock-Free, Wait-Free are all inter-related, in the context that they only make sense when defined together, so that no scenario "escapes through the cracks". Whatever definitions of Blocking, Lock-Free and Wait-Free we use, they should make it as unambiguously as possible which progress condition we are talking about when faced with a particular method... this stuff is hard enough as it is, so lets not make it any harder by making definitions that are incoherent with each other, or ambiguous  ;)

I'm secretly hoping that Herb Sutter will be covering this in depth at the next CppCon, because he has a talk devoted solely to Lock-Free algorithms named Lock-Free Programming (or, Juggling Razor Blades). The title alone is worth the registration fee!

In summary, the best definition of Lock-Free I've seen so far is the one from "The Art of Multiprocessor Programming":
"A method is lock-free if it guarantees that infinitely often some method call finishes in a finite number of steps."


  1. Hi,

    I was looking for a good definition of lock-free and I came across the definition by Herlihy , then I found your explanation which is pretty great, it makes perfectly sense. I wasn't sure how to interpretate "infinitely often" but now starts making sense. Great article!

    So, given this definition maybe you can help me understand something.

    In a recent article called Read-Log-Update (RLU, from SOSP'15, the authors propose an extension of RCU that allow multi-pointer update instead than just one at time as RCU does.

    I am trying to understand if some code that exploits the RLU Deferral algorithm would be considered lock-free or not. From my understanding, it can be considered lock-free for reading operations and non lock-free for writers, since I think that there could be a case with only a writers that if does not get a "sync request" it can indefinitely keep writing without ever commit the updates in memory. If you happen to read that article let me know what you think!

    Thanks a lot!

    1. Hi Ami,
      I'm glad you liked the post :)

      You are correct that the normal RLU algorithm is blocking for Writers because RLU_SYNCHRONIZE() is by definition a blocking call.
      I believe that even the "RLU Deferral" variant mentioned in section 3.7 is also blocking for Writers because RLU_SYNCHRONIZE() may still be called at some point.