Comments on Concurrency Freaks: FAAArrayQueue - MPMC lock-free queue (part 4 of 4)

2019-05-29T09:36:01.712+02:00

This comment has been removed by a blog administrator.

Hi Meng, This queue is indeed very fast but it is ...

2018-10-18T11:05:45.816+02:00

Hi Meng,
This queue is indeed very fast but it is not Wait-Free. Even though the method tryEmplace() is wait-free, this method may not do anything and the definition of wait-free must guarantee that work is done in a finite number of steps. "Doing nothing" is not wait-free, even if the "nothing" is done in a finite number of steps.

One suggestion, for the write_idx and read_idx you may want to try alignas(128) instead of alignas(64) because on x86 the prefetcher gets two consecutive cache lines, which means you may have (some minor) false sharing even when writing in the adjacent cache line. https://github.com/MengRao/WFMPMC/blob/master/WFMPMC.h#L200

I've just created a wait-free mpmc in 100+ lin...

2018-10-18T10:45:45.758+02:00

I've just created a wait-free mpmc in 100+ lines of C++11 code: https://github.com/MengRao/WFMPMC

Can you help check?

Thanks,
Meng

Pedro, thanks for the article and the queue code. ...

2017-11-21T17:06:26.215+01:00

Pedro, thanks for the article and the queue code. My benchmarks (incomplete) show that it has great performance.
I did find a race condition in my rendering of the code (another pusher thread sneaked in during the creation of the new node and the item added was out of order), and was able to fix it.

I would replace lines 155-160:
Node* newNode = new Node(item);
if (ltail->casNext(nullptr, newNode)) {
casTail(ltail, newNode);
hp.clear(tid);
return;
}

with:
Node* newNode = new Node();
if (ltail->casNext(nullptr, newNode)) {
continue;
}

Hi Jeehoon, Thanks :) In the YMC paper they have...

2017-11-10T20:23:55.023+01:00

Hi Jeehoon,
Thanks :)

In the YMC paper they have two queues: one queue is obstruction free, the other queue is wait-free unbounded.
The FAA ArrayQueue has some similarities to the obstruction-free queue shown in the YMC paper.

As it so happens there is no known way (and I believe there never will be) to provide wait-free unbounded or bounded memory reclamation for that particular algorithm that is used in the wait-free unbounded queue of YMC, therefore, it is not possible to use it in production with wait-free properties, because in production we always need wait-free memory reclamation (otherwise we could go for simply lock-free or blocking).

Thanks for a wonderful post! I just wonder if why...

2017-11-10T07:19:05.868+01:00

Thanks for a wonderful post! I just wonder if why do you think the YMC queue is just obstruction-free. YMC claim in the title of the paper that their queue is wait-free.

Thank you.

2017-08-07T16:59:44.137+02:00

Thank you.

Hello, If I remember correctly, we used the same b...

2017-08-07T15:21:42.553+02:00

Hello,
If I remember correctly, we used the same benchmarks as on the Turn Queue, which are available here
https://github.com/pramalhe/ConcurrencyFreaks/tree/master/CPP/papers/crturnqueue

The burst benchmarks are these:
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/CPP/papers/crturnqueue/include/BenchmarkQ.hpp#L177
and they are described in the Turn queue paper:
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/papers/crturnqueue-2016.pdf

Please provide link to benchmark sources. I want t...

2017-08-05T02:51:57.286+02:00

Please provide link to benchmark sources. I want to reproduce it.

2017-01-20T22:57:35.034+01:00

This comment has been removed by the author.

1) FAA is faster than CAS, that's why it's...

2016-11-27T12:00:17.688+01:00

1) FAA is faster than CAS, that's why it's preferable to use it. Go and take a look at the LCRQ paper:
http://www.cs.tau.ac.il/~mad/publications/ppopp2013-x86queues.pdf
or this recent ACM Queue article (easier to read) also from Adam Morrison:
http://queue.acm.org/detail.cfm?id=2991130

2) Agreed! It is possible and it does not implement the queue interface.

3) Dude, again you're one post ahead of me ;)
I have a post dedicated to explain the "bump out" effect you described, but the summary is that yes it is possible,
and it happens somewhat when there are more/faster dequeuers than enqueuers, but even then, this algorithm is still lock-free because with an array of size 1024, after 1024 failed attempts to enqueue the next node will have the first entry of the array pre-filled with the item to enqueue.
This is the same reason why LCRQ is lock-free.

If enqIdx.getAndIncrement is within the bounds, wh...

2016-11-27T11:43:45.826+01:00

If enqIdx.getAndIncrement is within the bounds, why do you CAS the value instead of lazy setting it? I'd guess there is no longer a race on that array entry since the index is unique to the current thread.

Although it doesn't implement the Queue interface, there could be holes in the array and dequeue may return null indicating an empty queue even though the queue is not empty. (It's a similar case to the weak polls in the JCTools queues.)

I'm not sure about this but let's consider the case where there are 1 enqueue and 1 dequeue threads for simplicity. The enqueue thread picks up slot 0 but stalls before the value CAS. The dequeue thread picks slot 0, swaps in the "taken" token, it finds a null value, loops back and due to the deqIdx >= enqIdx without next, it returns null. Now the enqueue thread resumes and fails to CAS in the value (due to finding "taken" instead of null). This could lead to enqueue chasing dequeue without actually delivering the value over to the other side.