On the previous post we talked about The Encapsulator, an MPMC queue that is wait-free for enqueue() and lock-free for dequeue(), and we're going to cover it a bit more in this post.
In The Encapsulator there is one Encap object for each item. Each node has one or multiple pointers to Encap objects:
There is an items array in each node with up to maxThreads entries, pointing to different Encap objects:
Although on a per node basis each entry in items is unique (points to a different Encap), it can happen that different nodes point to the same Encap objects, i.e. there are “repetitions”:
Dequeuing an item consists of doing a CAS on the pointer of the Encap which means that if one thread goes through nodeA and dequeues z, another thread going through nodeC will see the item z as removed, or fail in its own CAS attempt to dequeue:
One of the main issues with making a (single-list based) queue that is wait-free for enqueues, is for every thread to know if a certain item has already been enqueued (is the request open/closed ?). Different queue algorithms have different ways of solving this particular problem.
On the CRTurn queue and the queue by Kogan and Petrank, this is solved in a "lock-step" way, where each operation is done one step at a time to make sure it becomes visible to all threads. The SimQueue by Fatourou and Kallimanis uses a super cool technique that makes a snapshot of the requests and one extra bit that is XOR'ed... non-trivial, but really smart approach.
On The Encapsulator we solve this problem by allowing an item to be enqueued multiple times, i.e. when in doubt, enqueue it again.
This would be a big problem, causing the same item to be inserted multiple times (duplicate entries in the queue) which would means it is no longer a correct FIFO queue, but thanks to the Encap we guarantee that each item is dequeued once and only once.
There may be different nodes holding a pointer to the same Encap object, but once dequeued, this becomes visible to all dequeuers, and one and only one of the dequeuers will be able to have a successful CAS that logically removes the item from the queue.
Funny enough, it's always the first node that does it, but it could be done from a different node and it would still be linearizable... muahahah
Every rose has its thorns, and The Encapsulator is no different: by allowing an item to be enqueued multiple times, we are using extra (wasted) memory in the arrays of items and in the worse case scenario (no matter how unlikely) we could end up with a very large number of nodes having duplicated pointers to Encaps that have long been dequeued.
There is a way to overcome this: the trick is to have an array for each thread where each thread stores the last seen entry for enqueuers[i] of the other threads. If the entry is still the same on the next call to enqueue() then it will not be placed in the node.items array.
Remember that the Encap objects are unique, so we can distinguish them, even if they encapsulate the same object, i.e. The Encapsulator correctly handles the enqueueing of the same item by multiple threads.
With the approach of the thread-local array (per thread) we create a bound on the number of times a given Encap can be added to a node in the queue. This bound is maxThreads, iff each thread enqueues it once.
Simple trick, and efficient. The disadvantage is that with this approach, an empty queue is still chewing up arrays occupying maxThreads^2, due to having one array of size maxThreads per thread, which is not too nice if you have a lot of threads and a lot of queue instances.
... there are always trade-offs.
On the next post we'll look at some benchmarks in Java.