Wednesday, November 6, 2019

CX - Wait-Free Universal Construction

After many years, we've decided to make public the source code for the CX Universal Construction and explain what it is. Source code can be obtained here: https://github.com/pramalhe/CX
and paper here: https://arxiv.org/abs/1911.01676

CX is the first wait-free Universal Construction that can actually be used in practice.
It can take lambdas for its operations, it has integrated (wait-free) memory reclamation, it was extensively tested and it doesn't leak.
CX allows you to take a sequential data structure or object (single-threaded code) and make a wait-free concurrent implementation of the same object by simply encapsulating the methods in calls to updateTx() or readTx(). The resulting data structures have linearizable operations.

Transforming a sequential data structure with CX is not trivial, but it is simple enough that a developer fresh out of college can do it. There are a few rules specific to wait-free UCs that need to be followed (no I/O, no exceptions, no side effects, deterministic code only) but these are easy enough to adhere to.

Unlike handmade wait-free and lock-free data structures, you can have any kind of type in your container (there are limitations on the return values though). All lock-free data structures I've seen so far were meant for integers only, or something equal or smaller than 64bits. Moreover, with CX it's easy to modify the data structure later and add more functionality.

Unlike STMs, CX does not need to have interposing and therefore, we can use CX to wrap data structures in libraries to which we don't have access to the source code. For example, you can use CX to wrap STL containers (we have!). Here is an example on how to make a wait-free std::set
https://github.com/pramalhe/CX/blob/master/examples/WFStdSet.hpp
or even easier, just create the instance and do the operations on the fly:
https://github.com/pramalhe/CX/blob/master/examples/example1.cpp

All you need is a copy constructor for the underlying data structure you want to protect. As long as your sequential data structure has a copy constructor (or something logically equivalent) and deterministic operations, you should be able to convert it to a wait-free data structure.

We've used CX to make the world's first wait-free balanced binary tree, with integrated wait-free memory reclamation, capable of supporting multiple types. How did we do that? Well, we wrapped a std::set in CX and BOOM, out comes the world's first wait-free balanced BST  :)


Several years ago, when Andreia first came up with the idea that we could use a reader-writer lock to achieve wait-freedom, I though that was crazy, but cool. After joining up forces with Pascal and a lot of work, we were able to turn that into a universal construction, CX.
Sure, it has its limitations, but it opens new venues for research and is evidence of some contrarian concepts:
- wait-freedom can be generic and fast (at least for read-mostly workloads);
- wait-free memory reclamation can be lightweight;
- wait-freedom can be achieved with locks;


Now for the bad news: just because it's wait-free, doesn't mean it scales.
CX serializes all mutative operations which means it's flat at best if the workload consists of only mutations on the sequential object.
It also consumes larges amounts of memory and in the very unlikely worst case, may require 2x MAX_THREADS replicas of the sequential object. If the object is a data structure with 1000 nodes, then it doesn't make a dent, but if we're talking about a container with multiple millions of keys and tens of threads writing to it, then it's going to chew up with DRAM pretty fast.
This high memory usage means CX likely won't be your favorite concurrency approach when deploying in production.

The good news is that it scales really well for read-only operations. This means that as long as your workload is read-mostly, it will be able to scale.
Synchronization for read-only operations in CX is very low, typically a few acquire-loads and a single seq-cst fence instruction and beyond that it's just regular loads.
Yes, that's right, the user-code you place in the lambdas is being executed without having to do any load acquires or heavy synchronization fences, just like when you execute code inside a lock!
This is what makes CX be very fast for read-mostly workloads.


In fact, it's so fast that when we submitted the paper to conferences, a few reviewers complained that it was too good to be true and it couldn't possibly be correct. That's how fast it is  ;)
Hand-made lock-free data structures can take months (or years) of work by expert researchers to complete, therefore it's normal be greeted with disbelief when we show them a technique capable of transforming generic sequential code into wait-free safe code. Something a young developer can do in minutes. If on top of that we say that it has wait-free memory reclamation and on top of it we show plots where it beats multiple handmade data structures in read-mostly scenarios, then they go ballistic and think it's all a bunch of BS.
It's understandable, we all have our biases. Mine is to think that it's a good idea to trade off large amounts of memory for a generic construct that scales in read-mostly workloads.What's yours?

Hey, nobody's perfect!

2 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete