Concurrency Freaks: You say read-committed, I say buggy-code, potatoe, potato, tomatoe, tomato

Friday, June 12, 2020

You say read-committed, I say buggy-code, potatoe, potato, tomatoe, tomato

If I were to make a concurrent map data structure whose operations are not linearizable and then put it on a library and give it to my users, any user of that library would come back to me and say that my library has a bug.
They may not be able to tell me exactly what it is, but they'll understand that there is a bug in it.

However, if I take the same concurrent map data structure, and put in an application and call that application a "Key-Value store" or a "Database" (DBMS) and give it to the typical database users, it seems they may certainly use it for a several decades without ever complaining that this "Database" is not linearizable (or serializable as the DB folks call it).

If this sounds far-fetched, then just go and read this post: http://jepsen.io/analyses/postgresql-12.3
It seems that the Postgresql users really didn't care that postgresql isn't serializable when told to be so and in fact, isn't even read committed by default, which should be the default. And a similar thing happened to MongDB last month, so it's not something specific to Postgresql.

I find this interesting because it's an extreme case of managing expectations: When you take a library of concurrent data structures, you expect a certain kind of consistency, namely, you expect linearizability (just go and read discussion on the Java concurrency mailing list if you don't believe me).
However, if you take a DBMS like Postgresql, you no longer expect linearizability to be the default behavior. In fact, you expect read committed as the default behavior, which turns out Postgresql doesn't even give read-committed and instead gives snapshot isolation.
A user of a library of concurrent data structures would call read committed a bug. She would also call snapshot isolation a bug, and pretty much everything that is not linearizable would be a bug to her.
The reason it would be a bug, is because it is very hard to reason about anything that is not linearizable. In the post by Jepsen you can even see that there is no exact definition of read committed, so I guess that's one of the reasons why nobody complained about it before.

I can easily imagine discussions of the type:
DBMS User: This behavior is weird!
DBMS Developer: No, it's not weird, it's "read committed", you just have to go and learn about what it means!
DBMS User: The definition of "read committed" is so fluffly that it can mean any weird thing... I can't even understand if the weird thing I'm observing is "read committed" or not.
DBMS Developer: See!?! I was right, this is "read committed"!
DBMS User: Ok, I'll keep using your DBMS because all the other DBMS work the same way.

I could have understood if it was a distributed database, because the cost of a serializable transaction over a distributed database is likely proportional to the latency of the furthest node in the database, while for read-committed it may be lower (who knows?). But the scenario that Jepsen describes isn't even about distributed databases. It's a bug found running on a single-node database. There's no "distributed-databases-are-hard" excuse here (which they are, it's true).

It makes me wonder how did the DBMS folks got their users so well trained that they don't complain about consistency bugs in the DBMS?!?
On one side, I'm super envious because I secretly wish I could be as dismissive to my users as the DBMS folks are to theirs, lol.
But seriously, how could this have happened for so long?!?
I see two possible reasons:
1st, the users don't really care about consistency. They're running businesses. As long as the DBMS is fast enough and has the features that they need, they'll continue to spew out cash for it. Correct consistency is not an issue for the 99% use-cases of databases, as long as the data doesn't get corrupted or lost, everything's fine.
2nd, it's always been like that. Everybody accepts it works like that, and if you want something better, you have to go for a niche DBMS (not that easy to find). Read-committed, snapshot isolation and other strange names as such, are just the "status quo" and nobody wants to change the status quo.
3rd, the DBMS folks hide behind the wall of complexity that is the DBMS. It's a common scenario in IT. They would say something like "Ooooohhhhh this DBMS is too complicated for mere mortals to question! It takes many years of work and millions of lines of code! Here be dragons! Oooohhhh".

If you think of other reasons behind this, I would like to hear about them in the comments section.

Anyways, with the advent of Persistent Memory (PM) and Software Transactional Memory for PM, this game is changing.
One example close to my heart is RedoDB.
RedoDB is a "key-value store" but it supports linearizable transactions over any C++ data type (needs to be type-annotated though). Not only that, but these transactions are wait-free.
That's right, you heard it well: RedoDB provides durable linearizable/serializable wait-free dynamic transactions.
No only does it do that, but is does it slightly faster than RocksDB.

The downside? Consumes vast amounts of memory, though it won't be any worse than most implementations of Multi-Version-Concurrency-Control. At least in RedoDB there is a bound on memory usage.

Anyways, our goal was to show that wait-free DBMS are feasible and can be made efficiently. We weren't aiming for a commercial product.
Ohh and did I say that this DB is about 3k lines of code, as opposed to the several hundred thousands LOC for other DBMS?

You can checkout the paper for RedoDB here: https://dl.acm.org/doi/abs/10.1145/3342195.3387515
and the source code here: https://github.com/pramalhe/RedoDB

11 comments:

Andrey LomakinJune 19, 2020 at 1:17 PM
Hi,
I have been interested in this work since first paper was published.
When you say, that RedoDB is slightly faster than RocksDB what do you mean? Was it tested on single thread or multi thread loading ?
ReplyDelete
Replies
Andrey LomakinJune 22, 2020 at 4:42 PM
Hi Pedro,
If you do not mind, I have one more question. Did you measure space consumption of RocksDB and RedoDB while you were running benchmarks ? That is clear that it depends on data and pattern of operations performed but did you compare space consumption of RocksDB and RedoDB during those tests ? IMHO in this case would be more fair to compare space consumption of RedoDB and PostgreSQL. But if you do not mind could you provided those data if you have them ?
ReplyDelete
Replies

Add comment