Wednesday, January 21, 2015

Resettable Linearizable Monotonic Statistical Counter

We've talked before about monotonic statistical counters, with one example being the DCLC. Statistical Counters are used in a lot of places. and they don't have usually a linearizable reset/clear method, but on systems with a GC (like Java) it's actually simple to add such a functionality.

Why is a reset/clear method with Linearizability a desirable thing?
Well, suppose you're using a statistical counter on your app, let's say some web app, and you count the number of incoming requests and outgoing replies using one statistical counter for each. You would expect that these two numbers always match very closely, unless there is a bug.  Example:
Requests: 300037  Replies: 300034

If you display these values to the user, and if you do a reset/clear on these counters, it could happen that while you're calling sum() there is another thread calling reset() and the operation is caught half-way, thus having some of the counters already at zero and others not yet, which means that one of the counters will show the full counters (not yet resetted) or already zero, and the other will be half-way or something from the old values. Example:
Requests:      0  Replies: 120034

If the user sees this, she will find it "weird" and may even submit a bug report about this issue.
Now, this isn't incorrect per se, it's just that the consistency model in the reset() method is weak, and therefore, you won't see the effects of it in sum() in a single snapshot.
There is an easy way around this (on systems with GC), and that is to use the Copy-On-Write pattern, i.e. create a new array of counters, with all entries at zero, and do a replacement of the reference to the current array of counters with a volatile store. The code is this, where counters is a volatile reference:

public void reset() {
    counters = new AtomicLongArray(kNumCounters*CACHE_LINE);

The older array will be cleaned up by the GC when there is no longer any thread referencing it.
That's it, super simple to implement... maybe not so easy to understand why it is linearizable when done in this way, but this is such a limited use-case that I'm not even going to bother to write down the linearization points.

As always, code is available on github:

No comments:

Post a Comment