Tuesday, May 21, 2013

How to micro-benchmark concurrent data structures?

One of the issues with benchmarking of concurrent algorithms is to choose the right distribution of tasks among the threads.

For many of our tests we used equal-capability threads where each thread can either be doing a Read operation or a Write operation according to some probability (like for example: 50%, 10%, or 0.1% Writes).
This is fine for testing Blocking algorithms like Locks or lock-based data structure, but for data structures with Wait-Free capabilities it just doesn't work.
The reason is that, when we use the equal-capability threads approach there is an implicit dependency between the Reads and the Writes, and eventually we can (and will) end up with all Reads finished and only threads attempting to Write are waiting, or maybe the other way around, depending on whether there is some kind of reader-preference or writer-preference. Either way, we create a "blocking" dependency between the two kinds of operations, which can bias the micro-benchmark results significantly.
Here is a schematic that shows what the issue looks like:


In this example we have threes threads each doing 50% Writes (one Read operation, followed by one Write operation, followed by another Read, and so on).
Most of the time of each thread is spent waiting for the Write to finish because this is usually slowest operation. This shows that in practice we are "blocking" the Read operations, even if they are Wait-Free because each thread will not be able to progress to the following Read until it does a Write, thus causing a blocking dependency.


The solution is to use dedicated threads: some threads are Writers, and the remaining threads are Readers.
This way there is no blocking of the tasks of reading and writing to the data structure, and as many as can be executed will be. It has the added advantage that you get to test as well the "fairness" of the data structure when concurrent Reads and Writes are done.

We are currently finishing up a paper on a concurrent pattern which we call the "Left-Right pattern", and it has the property that Writes are Blocking and Reads are Wait-Free Population-Oblivious. With this kind of guarantees, we decided to make tests where we run one or two Writer threads, and all the other threads are Readers.
The reasoning behind it is that choosing two writer threads makes sure that at any given time there is at least a Writer waiting to execute, and then adding Reader threads will show how it scales under Read operations. This is a good test of the overall "fairness" of the data structure under high-contention. A "fair" data structure should provide proportional or constant performance for each of the operations (whether they are Writes or Reads) as we scale one of the others, in the case of our "Left-Right pattern", we scale up the number of Reader threads and check how the number of Write operations changes as a function of number of Readers.



In conclusion, mixed-task threads are fine for testing Blocking data structures, but for the ones with Wait-Free proprieties, it is better to use threads with dedicated tasks, in order to avoid blocking dependencies between the tasks in the test framework itself.

1 comment: