Friday, July 18, 2014

Lock-Free is not Starvation-Free

Still on the topic of Progress Conditions, there is a very good paper (or essay?) by Maurice Herlihy and Nir Shavit
http://www.cs.tau.ac.il/~shanir/progress.pdf

Figure 1 is particularly enlightening, showing a kind of periodic table of progress conditions:

There is a bit of an insight I would like to cover on this post, and the insight is this:
Lock-Free is, by definition, starvable.

Or maybe put in another way, a method that is (just) Lock-Free, never has the guarantee of being free from starvation. If it has this guarantee, then it must be Wait-Free.

I know, it's tricky, but lets dig a bit more.

For Blocking methods, there is a property named Starvation-Free which guarantees that the method will complete if the other threads are able to finish. This means that as long as the thread-scheduler is fair, this method will not starve, i.e. its execution will not be stopped by other methods.
A nice example of a Starvation-Free method is a mutual exclusion lock (mutex) based on the CLH algorithm that we talked about a few posts ago. It uses a queue with wait-free properties for enqueuing and blocking for dequeuing.
Lock-Free doesn't give such a guarantee.
Imagine for example a CAS loop to increment a counter. It can happen that one thread fails indefinitely because other threads win the CAS, and although there is a non-zero probability to succeed in the CAS, there is no guarantee of how many attempts are needed to succeed, i.e. there is no theoretical maximum number of steps for the CAS to be successful.
The only way to guarantee that it would not be starved, would be to have a bound on the number of steps, and if such a bound existed, then the method in question would be (by definition) Wait-Free instead of Lock-Free.

I'm not completely sure, but there seem to be two consequences to this train of thought:

First, the only way to achieve starvation-freedom is to use mechanisms that are wait-free by themselves. Starvation-freedom is a Blocking property, which means that we're putting together wait-free mechanisms to make something that in the end is Blocking, so why not use Lock-Free instead?
Well, because if we use Lock-Free mechanisms we can not achieve starvation-freedom. For example, you can't make a starvation-free lock by using a lock-free queue.

Second, if starvation-freedom has a bound on the number of steps to complete and lock-freedom does not, then we can conclude (theoretically) that the maximum latency may actually be better, assuming a fair thread-scheduler and other ideal conditions.
So, although Lock-Free should usually have a better latency distribution (closer to the left side) than Starvation-Free, Lock-Free can never guarantee a maximum cut-off in terms of number of operations, which means that it might be preferable to use it if your latency needs are more about the tail than about the bulk.

As an aside to all of this, a bit of "Statistics" is in order: what people usually call "performance" is actually the "mean of the latency distribution".
If an operation takes on average 10 micro-seconds (us) to complete, this is usually written as your algorithm being able to perform 100000 (one hundred thousand) operations per second, but this is just "a number". This number alone won't tell you if most of the operations took 1us to complete and then some of them took a few seconds, or, if all of them took between 9us and 11us to complete.  To understand this you have to look at a distribution of the latency with enough samples.

Why should we care about starvation-free?
Partially, for the same reason that we care about Wait-Free: to have a strong bound on max latency, at least a theoretical bound.

We'll have to talk more about Latency one of these days, but the takeaway of all this blahblah is this:
All Lock-Free methods can be starved, but some Blocking methods will not starve (and we name those Starvation-Free).