We're back on the topic of Counters because a friendly reader has asked for a comparison with LongAdder which I didn't even know it existed (thanks AshKrit!).
It seems that LongAdder is part of the java.util.concurrent on JDK 8, and was made by Doug Lea:
The first test is similar to the one on the previous post about Distributed Cache Line Counters where we make the first thread just be a reader (calls get() in a loop), and all the other threads are Writers (call increment() in a loop). The results on our 32 core opteron machine can be seen below:
Every once in a while there is a spike in performance, which is due to "collisions" in the array of counters of the DCLC, but overall it can have a bit more of performance than the LongAdder, at the cost of using up more memory.
If we look at scenarios with a lot more threads than cores, then the LongAdder can outperform the DCLC:
the above plot shows the same test with 32, 64, 128, 256 and 512 threads.
We then made a different kind of test where half the threads are doing a get()/sum() and half the threads are doing increment(), in a dedicated way.
We separated the results into two plots to be easier to understand, and removed the AtomicLong results for the get() because it is so much better (scales linearly) that it doesn't even allow us to compare the LongAdder with the DistributedCacheLineCounter:
On the increment() plot the difference between the DCLC and the LongAdder becomes more significant. I don't really know the "why" because the LongAdder code is tricky to understand, but the difference can be significant, so if you need something like this for your app, you can try them both and use the one that gives the best performance for your particular case.