Concurrency Freaks: 2020

If I were to make a concurrent map data structure whose operations are not linearizable and then put it on a library and give it to my users, any user of that library would come back to me and say that my library has a bug.
They may not be able to tell me exactly what it is, but they'll understand that there is a bug in it.

However, if I take the same concurrent map data structure, and put in an application and call that application a "Key-Value store" or a "Database" (DBMS) and give it to the typical database users, it seems they may certainly use it for a several decades without ever complaining that this "Database" is not linearizable (or serializable as the DB folks call it).

If this sounds far-fetched, then just go and read this post: http://jepsen.io/analyses/postgresql-12.3
It seems that the Postgresql users really didn't care that postgresql isn't serializable when told to be so and in fact, isn't even read committed by default, which should be the default. And a similar thing happened to MongDB last month, so it's not something specific to Postgresql.

I find this interesting because it's an extreme case of managing expectations: When you take a library of concurrent data structures, you expect a certain kind of consistency, namely, you expect linearizability (just go and read discussion on the Java concurrency mailing list if you don't believe me).
However, if you take a DBMS like Postgresql, you no longer expect linearizability to be the default behavior. In fact, you expect read committed as the default behavior, which turns out Postgresql doesn't even give read-committed and instead gives snapshot isolation.
A user of a library of concurrent data structures would call read committed a bug. She would also call snapshot isolation a bug, and pretty much everything that is not linearizable would be a bug to her.
The reason it would be a bug, is because it is very hard to reason about anything that is not linearizable. In the post by Jepsen you can even see that there is no exact definition of read committed, so I guess that's one of the reasons why nobody complained about it before.

I can easily imagine discussions of the type:
DBMS User: This behavior is weird!
DBMS Developer: No, it's not weird, it's "read committed", you just have to go and learn about what it means!
DBMS User: The definition of "read committed" is so fluffly that it can mean any weird thing... I can't even understand if the weird thing I'm observing is "read committed" or not.
DBMS Developer: See!?! I was right, this is "read committed"!
DBMS User: Ok, I'll keep using your DBMS because all the other DBMS work the same way.

I could have understood if it was a distributed database, because the cost of a serializable transaction over a distributed database is likely proportional to the latency of the furthest node in the database, while for read-committed it may be lower (who knows?). But the scenario that Jepsen describes isn't even about distributed databases. It's a bug found running on a single-node database. There's no "distributed-databases-are-hard" excuse here (which they are, it's true).

It makes me wonder how did the DBMS folks got their users so well trained that they don't complain about consistency bugs in the DBMS?!?
On one side, I'm super envious because I secretly wish I could be as dismissive to my users as the DBMS folks are to theirs, lol.
But seriously, how could this have happened for so long?!?
I see two possible reasons:
1st, the users don't really care about consistency. They're running businesses. As long as the DBMS is fast enough and has the features that they need, they'll continue to spew out cash for it. Correct consistency is not an issue for the 99% use-cases of databases, as long as the data doesn't get corrupted or lost, everything's fine.
2nd, it's always been like that. Everybody accepts it works like that, and if you want something better, you have to go for a niche DBMS (not that easy to find). Read-committed, snapshot isolation and other strange names as such, are just the "status quo" and nobody wants to change the status quo.
3rd, the DBMS folks hide behind the wall of complexity that is the DBMS. It's a common scenario in IT. They would say something like "Ooooohhhhh this DBMS is too complicated for mere mortals to question! It takes many years of work and millions of lines of code! Here be dragons! Oooohhhh".

If you think of other reasons behind this, I would like to hear about them in the comments section.

Anyways, with the advent of Persistent Memory (PM) and Software Transactional Memory for PM, this game is changing.
One example close to my heart is RedoDB.
RedoDB is a "key-value store" but it supports linearizable transactions over any C++ data type (needs to be type-annotated though). Not only that, but these transactions are wait-free.
That's right, you heard it well: RedoDB provides durable linearizable/serializable wait-free dynamic transactions.
No only does it do that, but is does it slightly faster than RocksDB.

The downside? Consumes vast amounts of memory, though it won't be any worse than most implementations of Multi-Version-Concurrency-Control. At least in RedoDB there is a bound on memory usage.

Anyways, our goal was to show that wait-free DBMS are feasible and can be made efficiently. We weren't aiming for a commercial product.
Ohh and did I say that this DB is about 3k lines of code, as opposed to the several hundred thousands LOC for other DBMS?

You can checkout the paper for RedoDB here: https://dl.acm.org/doi/abs/10.1145/3342195.3387515
and the source code here: https://github.com/pramalhe/RedoDB

Sunday, May 17, 2020

Dealing with bugs in Concurrency and Durability

Everyone knows that Concurrency is hard, in fact, it's NP-hard: the total number of interleavings typically grow exponentially with the number of lines of code (LOC) in a program. Determining if any of these interleavings causes incorrect behavior, becomes problematic for anything but very small programs.
This never prevented anyone from writing large multi-threaded programs, and the usual trick is to use locks. A code block surround by a lock does not have interleavings with other code blocks for the same lock, and the problem becomes "tractable" again.

Unfortunately, we can't use this trick for lock-free and wait-free code. Lock-free algorithms are typically done with atomic instructions that can interleave with each other.
Verifying correctness of these algorithms is a difficult problem.
There are static verification tools like SPIN, JSF, TLA+ and others, but such tools only work for a small number of threads (no more than 6 threads) and small-size programs (less than 100 LOC). As if this wasn't bad enough, these tools enforce the user to write the program in a particular language, apart from JSF, which is almost Java. The program will have to be re-written later in the target language that is meant for, and during this "translation" errors can be introduced.

Now, don't get me wrong, software has bugs. It's just part of life as a software developer to deal with them and try to minimize them.
Entire industry trends like Test Driven Development (TDD) and Test Automation revolve around the concept of testing to reduce the amount of bugs.
In other words, there's this idea that the more time you invest in tests, the more quality the final program will have. IMO this is true, but obviously it's not a linear effort: beyond a certain number of tests, the number of new bugs you'll detect will be small compared to the effort of maintaining those tests.

Concurrency bugs can, and should, also be approached in this manner, but this is far from sufficient.
Moreover, testing a concurrent program requires a different set of techniques and mindset than testing a sequential program: instead of exercising input-output responses, we should test program invariants; instead of mocking components, we should stress the code by increasing the number of threads or changing timings to exercise different interleavings and code paths.
This is where run-time verification can provide a good help. Tools like valgrind and Address Sanitizer become a need-to-have if you're writing C/C++ multi-threaded applications and there is a good deal of research being done on this field, so I'm sure that more capabilities and more efficient tools will show up in the future. However...

Suppose I have a new lock-free algorithm and I implement it. Then I bother to write some tests for it (which most researchers won't do) and then run it with address sanitizer and whatever other run-time tools I can get my hands on. None of my tests or run-time tools encounter any error. Now what? Is my algorithm ready for publication? How will the peer reviewers know that it is correct? How will I convince them of that?
And here lies the big problem in researching concurrent algorithms that people don't usually talk about: "How do I know if the lock-free algorithm in this paper is correct?"

As I said before, it's ok to have bugs in your code, all software has bugs, and software developers are used to this inevitability.
What is not ok is to have bugs in your algorithm. It's very hard to prove the correctness of a lock-free algorithm. Although it's usually simple to verify that a bug exists, once someone tells you the sequence of steps to reproduce that bug.

This can lead to surreal scenarios that typically look like this:
Me: Read the latest paper on a novel lock-free algorithm. Spend weeks thinking about it. Come up with a scenario where the algorithm doesn't work. Send an email to authors describing the scenario.
Authors: Spend a week thinking if the scenario is actually a bug and how to fix it, propose a slight variation of the algorithm.
Me: Spend another week thinking about it, come up with a difference scenario where there is a race condition.
Authors: Spend another week thinking about it and propose a modified algorithm.
Me: Spend another week thinking about this different algorithm. Get to the conclusion that the algorithm is no longer lock-free. Send a new email explaning why.
Authors: Spend another week thinking about it and propose yet another version of the algorithm.
Me: Spend another week thinking about it, only to reach the conclusion that this new algorithm is not even linearizable.
Authors: We weren't aiming for linearizability anyways...
Me: Lose all faith on the work done in the field of concurrent algorithms, and lose faith in mankind in general... at least for a couple of weeks, then I read another paper that looks interesting and the cycle restarts.

This can sound bad but it's a description of when things turn out good. Other times I send the email and the authors just ignore me, or they answer something that shows that they don't even understand there is a bug, or there is no bug and the whole things was a result of me missing some vital detail in the algorithm.
Ultimately, either I shame the authors by pointing out their mistakes or I shame myself for being too dumb to understand their paper. Never a good turnout and it's not a good way of "making friends". Unfortunately I can't help myself, I'm in it because I want to learn, and you learn the most when you're willing to make a fool of yourself ;)

This is not about incompetence, this is about the inherent difficulty in writing lock-free algorithms. The authors are not sure that their algorithms are correct, the reviewers are not sure that the algorithms are correct and the readers of the papers are not sure that the algorithms are correct. And don't get me started on the actual "implementations" of these algorithms.

It's a mess, and it creates a lot of distrust in the field, specially during peer revieweing.
Papers end up being accepted only if there is an expert reviewer that really knows about this stuff and decides to spend the several weeks of his personal time it takes to fully understand that algorithm and then comes back to the other reviewers and manages to convince them that this stuff is correct.
Most of the times, academic papers gets accepted just because they're not obviously wrong. But the problem is that may be simply because the algorithm is so complex that no one is willing to "dig into it". I've seen examples of papers accepted into major conferences, where the basic algorithm was wrong (and there is not obvious way of fixing it) and then the authors will continue convinced that approach is valid and correct and write follow up papers that build on that algorithm and those papers continue to be accepted in major conferences, when it's all crap.
Luckily, this is rare. Most of the bugs I've seen in lock-free algorithms are things that can be fixed without a major re-design of the algorithm.

There is also the effect that if the algorithm is too simple (there aren't many of those, but still) then the reviewers may think that the work doesn't have enough value. But if the algorithm is too complex, then they wouldn't be able to understand if it is correct or not.
This kind of creates the incentive for papers to have incremental algorithms based on prior published work, or to have several simple algorithms as opposed to a single complex one.

I've noticed that when it comes to durability (persistence), the problem is similar.
Durable algorithms are not NP-Hard, the failure can occur at a single point in the code, which means that the complexity is linearly proportional to the LOCs, as opposed to concurrency algorithms where the complexity grows exponentially with the LOCs in the algorithm.
However, the pattern of me reading papers on durable algorithms and then sending emails and then the authors fixing but not really fixing is pretty much the same.

It's still a hard problem, particularly now with Optane DC Persistent Memory that has a special set of rules for data to be correctly persisted.
On Persistent Memory it is very easy to forget a flush or a persistent fence. This means the implementation is incorrect, however, in practice it may have no issue, because the probability of having a crash exactly in the place where the fence is missing is very low.
On the other hand, if there are fences missing all over the place or if there is something being used that simply doesn't make sense, then no matter where the program crashes, the data will end up corrupted. So there are different shades of grey here.

Again here the problem repeats itself when it comes to getting a paper accepted at a conference: unless one of the reviewers really knows its stuff and is willing to invest the time it takes to deep dive on the algorithm presented on the paper, getting a "correct" durable algorithm accepted is just a game of luck. The best that can be done is to make a convincing proof or at least explain some invariants.

I believe that as time goes by, there will be better tools that will help writing and verifying correct durable algorithms. The thing is, by the time that happens (years from now), all or most of the interesting durable algorithms will be published by then, so it kind of defeats the purpose. Moreover, for the complicated stuff we can always use transactional durable techniques (undo-log, redo-log, copy-on-write, shadow-data) or a universal construction, like the one we presented at EuroSys2020 : link here.

For concurrency it's really black and white: you miss a fence, you're algorithm is wrong and you WILL see a bug happen eventually.
I don't see any real solution for concurrent algorithms coming anytime soon. And that's a shame because as long as things remain the way they are, it will be very hard for the good (correct) algorithms to be distinguished from the bad (incorrect) ones. It becomes a kind of "reputation game" where papers from people who have made correct algorithms in the past, will likely have correct algorithms later as well, but this is not really reliable seen as we're all humans we're bound to make mistakes at some point. For now, all we can do is test our implementations of these algorithms as extensively as possible so as to minimize the chances of bugs, but it will never be 100% certain.

This discussion may seem a bit boring: software will always have bugs, so what's the problem with having buggy algorithms?
My argument is that it's a big deal, actually, it's a very BIG deal!I've seen many papers that build on previous papers and then I go look at the previous work on which they're based, and those are incorrect. We need solid foundations on which to build, to be able to progress further and faster.
If we don't give the proper value to the correct algorithms and point out the incorrect ones, then we're creating the wrong incentive. Researchers will not bother to spend the time testing or validating their algorithms. This creates the climate of suspicion that we're in today.

Ultimately, designing lock-free algorithms will never be "easy", but designing lock-free and persistent data structures is easy when you use universal constructions or a lock-free software transactional memory. This means all these are problems are an issue only for the researchers of concurrent algorithms, but not really a problem for people that need a peristent or lock-free data structure.