Comments on Concurrency Freaks: Concurrency Pattern: Distributed Cache-Line Counter

Hi Neo.X, Thanks for pointing this out! I had fixe...

2014-01-10T11:07:46.842+01:00

Hi Neo.X,
Thanks for pointing this out! I had fixed it a long time ago in the code, but completely forgot to edit the post
https://sourceforge.net/projects/ccfreaks/files/java/src/com/concurrencyfreaks/counter/DistributedCacheLineCounter.java

If it weren't for you paying attention, we could have had this bug here for a long time. Thanks for finding it!

Nice article but there is a bug I think: in get():...

2014-01-09T23:59:43.189+01:00

Nice article but there is a bug I think:
in get():
for (int idx = 0; idx < kNumCounters; idx += COUNTER_CACHE_LINE) {
sum += counters.get(idx);
}

should be
for (int idx = 0; idx < kNumCounters; idx++) {
sum += counters.get(idx * kNumCounters);
}

same thing in clear()

You were right it was due to false sharing, after ...

2013-09-16T16:22:25.460+02:00

You were right it was due to false sharing, after i added another implementation of padded counter and performance was in expected range after that.
Thanks.

correction: (...) each 16 entries share the same c...

2013-09-13T20:15:53.286+02:00

correction: (...) each 16 entries share the same cache line (...)

Hi Ashkrit, I looked at the source code you have o...

2013-09-13T19:24:34.246+02:00

Hi Ashkrit,
I looked at the source code you have on github and it seems to me, that the reason they all have the same performance is because all those implementations are suffering from "false sharing". Adjacent entries in the atomicArray variable of your CoreBaseCounter are usually sharing a cache line, where each 8 entries share the same cache line.
In addition, some of the big differences in performance are only noticeable when you go up to a large number of cores.
Even so, after you fix the false-sharing issue, try running with all threads doing only increment() to see how many ops you get more. On my Core i7 I see 65k ops/ms for an AtomicLong and 113k ops/ms for the DCLC when running with 4 threads.

Btw, I've heard Doug Lea mention a couple of times that even adding "padding" on an array may not be enough to avoid false sharing. The only sure way is to use @java.sun.misc.Contended

Cheers

Thanks for sharing your result. I also did quick p...

2013-09-13T18:54:37.695+02:00

Thanks for sharing your result. I also did quick prototype for scalable counters inspired by your blog.

http://ashkrit.blogspot.sg/2013/09/scalable-counters-for-multi-core.html

Do you have access to XEON processor based desktop ?
Result of my test on Xeon is very surprising, on Xeon performance of all the counters is almost same, all through there is CAS failure for Atomic Long.
On XEON CAS failure does't make any difference in over all timing.
Later i will use some of the back of strategy mention by dave dice in his blog for counter and test it on XEON.

Here is the link http://concurrencyfreaks.blogspot...

2013-09-11T10:55:43.050+02:00

Here is the link
http://concurrencyfreaks.blogspot.co.uk/2013/09/longadder-and-dclc.html

Hi Ashkrit, The tid2hash() is just a "randomi...

2013-09-07T22:51:35.864+02:00

Hi Ashkrit,
The tid2hash() is just a "randomizer" based on George Marsalia's algorithm: http://www.javamex.com/tutorials/random_numbers/xorshift.shtml

No, I haven't compared with LongAdder... in fact, I didn't even know about it until I saw your comment ;)
I'll re-run the microbenchmark with LongAdder and post the results soon.
Thanks

Very interesting blog, can you pls explain bit abo...

2013-09-07T19:12:09.531+02:00

Very interesting blog, can you pls explain bit about your magical tid2hash function ?
Did you benchmark it against LongAdder which is using Striped64 ?