Queues don't scale with the number of threads. Most of you that live in the land of multi-core programming know this already, but now it has been proven in this paper:
Notice that in the list of authors is included Nir Shavit, one of the co-authors of "The Art of Multiprocessor Programming" (so that's what he's been up to lately!).
For those of you not mathematically inclined (or don't feel like reading the 18 pages of it), here is my short interpretation of what this means:
These guys have proven that Queues have a time complexity that is at best of the order of the number of threads concurrently accessing the Queue, and if it wasn't enough, they proved it also for Stacks and a few other data-structures that have some implicit sequentiality.
What this means for us common mortals is, that even if someone discovers a fully Wait-free or Wait-Free Population-Oblivious Queue (assuming such a thing is possible), it will still have a performance that reduces proportionally to the number of threads adding/removing items in the queue, in other words, it will not be scalable.
So, Queues aren't scalable and they never will be... ever!
But don't despair, there is some hope hidden behind this. Namely, you can't have a Queue that is fully scalable, but it doesn't mean that all its operations are equally non-scalable.
Just like different methods in an algorithm may have different progress-condition guarantees, perhaps they may also have different time complexity. Assuming such a thing is possible, it could allow for the creation of a Queue that has constant time complexity for insertions, and quadratic or exponential time complexity for removals (or the other way around).
With a queue like that, you would be able to insert into the queue very quickly and have little or no impact on performance no matter how many threads would be inserting, but then doing the removal could take a substantial amount of time which would not improve with by adding more threads to do removals, and could in fact worsen.
What possible purpose could a queue like that serve? Well, in real-time systems it is usually desirable (or even needed) to handle a large number of events in a constraint amount of time, or some kind of similar bursty behavior. Imagine for example, a software process in a router that has a fixed time to read packets before more packets arrive and fill up the hardware queues, but as long as it is able to put these packets in a queue for later being processed by some other threads, it is ok.
Either someone will prove such a thing is not possible, or someone will come up with a Queue with these kind of proprieties, until then, whenever your application gives crappy scalability, just blame it on the queues :)