Wednesday, September 17, 2014

Slides from CPPCON 2014

I was supposed to give a talk about wait-free and lock-free data structures at CppCon this year, but unfortunately I had some health issues a few weeks before and on the week of the conference I was getting surgery, so there was no way to attend  :(
I'm getting better now, but I feel really sad for missing some of the talks, specially the ones from Herb Sutter. His slides are already online, and the video should be online in a month (hopefully):

Lock-Free Programming (or Juggling Razor Blades)

Some other interesting slides, this one about RCU mostly:
C++ Memory Model meets High-Update-Rate Data Structures




Sunday, September 14, 2014

Left-Right Memory usage

We've talked before about Left-Right, a technique that can allow for read-only wait-free access to any data structure or dataset by creating two instances, and also talked about Double Instance Locking which is a simpler technique that allows for lock-free instead of wait-free read-only access.

One of the first things that people tell us when they first look at the Left-Right pattern (or the Double Instance Locking) is that it has the disadvantage of using twice the memory. This is not completely accurate because there are more than one way to use the Left-Right pattern in a multi-threaded application, and for the most common approach, the reduction in memory usage should be small. Let us start from the beginning:


What is a data structure?

A data structure is a way of organizing your data. Your data, henceforth named dataset, can have multiple organizations, i.e. data structures.
A concrete example would be a banking software, where the dataset consists of the records for each account, containing the name of the account holder, address, telephone number, amount deposited, and other info. This dataset can be organized by multiple data structures, for example, a list-based-set containing all the high net-worth accounts, or a tree-based-map containing all the customers ordered by name, or ordered by the account id as shown on the schematic below, etc.



Left-Right on a data structure

Suppose you want the internal bank application to check the value of an account. If you need wait-free guarantees for this, you can use the Left-Right pattern around the data structure of your choice. For example, suppose you have a TreeMap ordered by customer name so that you can easily find the record account for a given customer, the Left-Right pattern allows you to do this, provided you create two TreeMap instances. Notice however that there will be one and only one instance of each account record (dataset), and if concurrent access to it is needed, some mutual exclusion should be used, or a rw-lock.



For most practical applications, the contention occurs on the data structure itself, so it's enough to use the Left-Right on the data structure. This approach means that we are only duplicating the memory usage of the data structure (because there are two instances of it), but there is only one instance of the dataset and, for most applications, the size of the data structure in memory is small when compared to the size of the dataset, which means that having twice the data structure does not significantly increase the total memory usage of the application.


Left-Right on the dataset

Alternatively, instead of applying the Left-Right to the dataset, we can apply it to the dataset. This means that if a data structure is used to access the dataset, it must be a concurrent data structure (and thus provide whatever progress conditions that particular data structure provides), but the read-only access to the dataset itself will be wait-free, and there will be two instances in memory of the dataset.



A single Left-Right pattern applied to the entire dataset has the advantage that you can provide (blocking) linearizable transactions between the records of the dataset, while at the same time providing wait-free read-only access to any of the records or even iterate through them in a wait-free and linearizable way. This approach will typically consume way more memory (almost twice as much) than the previous approach because it means the dataset is duplicated in memory, but it can provide very a strong consistency model (linearization) which is some times a requirement in applications, or at least it is the safest bet when you don't know how strong the consistency model should be for that particular piece of the application.

Left-Right on the dataset and on the data structure

I expect that the least common pattern of all is the usage of the Left-Right for both the data structures and the dataset. Usually the contention or scalability/latency bottleneck is due to either one or the other, not to both. even so, it is possible to wrap both the dataset and the data structures each with its own Left-Right pattern.

Left-right on the dataset per datum

On the two previous cases we had a single Left-Right protecting the entire dataset, but it is possible to have on Left-Right per datum of the dataset (account record) . This option will use even more memory and it no longer allows for wait-free read-only iteration over the dataset, or transactions with simultaneous wait-free read-only access of a datum.
Total memory usage will be twice the size of the data structure, plus twice the size of the dataset, plus the size of a Left-Right object times the number of Left-Right objects which is number of datum in the dataset.


Conclusion

When you design or adapt an existing multi-threaded application to use the Left-Right pattern or Double Instance Locking or any other concurrency pattern, the first thing you need to do is identify the points where the contention is the highest, and design accordingly.
For most situations, this means identifying one or two data structures on which there is a lot of contention, implying that you need to either use a lock-free data structure or you use a "single-threaded" data structure and wrap it with the Left-Right pattern. Using a mutex or a reader-writer lock is rarely an option for these situations.
I would expect that there are some situations where transactions need to be done between different datum of the dataset in a linearizable way, or even iterate over the dataset while still providing linearizability as a consistency model, then you need to use the Left-Right on the dataset itself (and maybe the data structure).
Plan carefully and measure the contention on a real, and an extreme scenario to verify what are the application's needs in terms of scalability and latency. In the end, you'll need to measure, even if it's just some prototype code in some stress-test scenario, you must measure.

And remember, the Double Instance Locking pattern can be used in any version of C or C++, but if you want to use the Left-Right you will need a language with a well defined memory model, like: C11, C++1x, D, Java, Scala, or Go.