Wednesday, August 7, 2013

Interrupt handler in C11 with Atomics

Writing an interrupt handler in C is something most people in the "embedded device world" have done at least once, and most of them know pretty well the pitfalls and issues that may show up with this kind of problem.
I've recently seen a coding challenge for how to write a simple interrupt handler that had to share state with the main task, and found it interesting because it has some degree of concurrency to it. Moreover, the current standard solution can be implemented using C11 (or C++11) atomics and gain some performance and simplicity. Let's start first with the standard solution in C and then show how to do it in C11 (or C++11) using atomics.

Standard solution in C:

We start with a very simple problem where the function called from the interrupt handler funcIntHandler() only has to increment a global counter global_counter that is then read by a function in the main context funcMainCtx().
Let's say we implement this with the following code:
long global_counter = 0;

void funcIntHandler(void) {
    doSomeFastLogic();
    global_counter++;
}

void funcMainCtx(void) {
    long local_counter;  
    local_counter = global_counter;
    printf("local_counter is %d\n", local_counter);
    if (local_counter % 2 == 0) doCounterEven();
    else doCounterOdd();  
}


There are some issues with this approach, namely:
  1. The compiler may decide to do an optimization where local_counter is replaced by global_counter. This would mean that the printf() could show a different value than the one that is used on the following if .
  2. The compiler may decide to cache the value of global_counter and read a very old value instead of the latest one written by funcIntHandler() which could invalidate the logic of the program.
  3. The asm code that is generated for the expression local_counter = global_counter may be several instructions long, which breaks atomicity. This would be the case if we were running on a 32 bit architecture and the global_counter is 64 bit, and needs to be loaded into memory in two steps, which the funcIntHandler() could interrupt, thus causing funcMainCtx() to read an invalid value into local_counter, which could break the logic of the program.
Issues 1 and 2 can be solved by setting the volatile attribute in the variable global_counter. This is not to be confused with the Java keyword "volatile" that has a very different purpose.
In C, the volatile qualifier gives the compiler the indication that this variable must always be read from memory and can not be cached or optimized in any way.
This doesn't solve issue 3 though, so to handle issue 3 we need something different, and the usual technique to ensure atomicity when reading the global_counter variable is to do it within a block of
disableInterruptHandler() / enableInterruptHandler() .

Here is what the code would look like on a proper implementation:


volatile long global_counter = 0;

void funcIntHandler(void) {
    doSomeFastLogic();
    global_counter++;
}

void funcMainCtx(void) {
    long local_counter;
    disableInterruptHandler();
    local_counter = global_counter;
    enableInterruptHandler();
    printf("local_counter is %d\n", local_counter);
    if (local_counter % 2 == 0) doCounterEven();
    else doCounterOdd();
}

By the way, if you want to learn more about volatile in C, here is a decent blog that you can check out:
http://blog.regehr.org/archives/28

Notice that in the code above, to prevent re-ordering of instructions, there is an implicit acquire-barrier in disableInterruptHandler() and an implicit release-barrier in enableInterruptHandler()... or at least that is what I thought, until Andreia pointed out that both disableInterruptHandler() and enableInterruptHandler() must have a full-memory barrier.

And why is that you may ask? Well, suppose you have a variable x that you need to read with interrupts disabled, and two other variables y and z that don't need to:

y = 3;
disableInterruptHandler();
x = 4;
enableInterruptHandler();
z = 5;

if there was an acquire-barrier on disableInterruptHandler() and a release-barrier on enableInterruptHandler(), then the following would be a valid transformation of the previous code which the compiler might decide to do (because there is no interdependency between the three variables x, y, z):

disableInterruptHandler();
y = 3;
x = 4;
z = 5;
enableInterruptHandler();

this may seem innocent enough, but what if instead of having a simple assignment you have some complex and lengthy calculation? Imagine something like:
disableInterruptHandler();
y = lenghtyCalculationWithResult3();
x = 4;
z = lenghtyCalculationWithResult5();
enableInterruptHandler();

this would mean that the optimized code generated by the compiler will disable interrupts for a very long time, even though the code you wrote looks just fine:

y = lenghtyCalculationWithResult3();
disableInterruptHandler();
x = 4;
enableInterruptHandler();
z = lenghtyCalculationWithResult5();

yep, it's bad, and the only way to ensure a proper implementation is to have full-memory barriers in both disableInterruptHandler() and enableInterruptHandler().

The problem with the classic technique is that it needs to disable interrupts for a while, which can be prone to causing issues if the code inside the disable/enable block is too slow and takes more than two consecutive interrupts.

Something that Andreia pointed out in the code for the standard solution, is that if  it is ran on a memory model with sequential consistency (i.e., like C11), then the global_counter variable does not have to be volatile, because the full memory barrier present in disableInterruptHandler() gives the guarantee that all variables written before will be visible in main memory (for example, by funcIntHandler()) and the variable will be read from memory the first time it is used after disableInterruptHandler().
It's as if the new memory model takes away the need to write complex code and prevent weird optimizations... cool stuff IMO.


C11 with Atomics:

The same kind of functionality can be accomplished with atomics in C11 without the need to disable/enable interrupts. Let's look at the code first, and then explain how it works:

atomic_long global_counter;

void funcIntHandler(void) {
    doSomeFastLogic();
    atomic_fetch_add(global_counter, 1);
}

void funcMainCtx(void) {
    long local_counter;
    local_counter = atomic_load(global_counter);
    printf("local_counter is %d\n", local_counter);
    if (local_counter % 2 == 0) doCounterEven();
    else doCounterOdd();
}

Notice that on atomic_load() we are doing an acquire-barrier and on atomic_fetch_add() an acquire-release-barrier, but that is still better than the standard solution which is doing two full barriers (like on the standard solution).

With the default sequential consistent memory model, the atomic_load() will read the latest value of global_counter written by the atomic_fetch_add(), with a guarantee of atomicity.


Why do we need the atomics way when you already have the classic solution with disable/enable interrupts + volatile ?

Well, the answer to that is a bit on the philosophical side. It is all about real-time systems and the kind of guarantees they give.
When you disable/mask interrupts in a real-time application, it means you're disabling the call to the interrupt handler for a while, or at least what you hope to be just a while. Developers that have worked with interrupt handlers know that it is easy to introduce bugs in this approach, because all that has to happen is that in the code block where the interrupts are disabled that some function calls some function that calls some function that does some I/O or some lengthy computation, and this is enough to delay the interrupt handler function long enough for bad things to happen.
That is why the atomics approach looks way better to me: no matter how long any computation takes to complete in the main context, the interrupt handler function will always run when the specified interrupt is called, and the state of the shared variables will still be consistent and correct.
Sure, it still has the problem that if the interrupt handler takes longer to complete than it takes for the next interrupt to be triggered then bad things will happen, but so does the classic solution, and this is usually something so bad that it is relatively easy to debug.
In the end, the approach with atomics provides a way of writing interrupt handlers with strong time guarantees and better stability, at the cost of making you think in a way that may be non-intuitive at first.


How about if we have to share state for more than one variable? Can we still use atomics?
Yes, we can!
We can use a control variable... but that is a subject for a future post.







No comments:

Post a Comment