To continue on last week's topic, today we're going to go over some leaks in persistent memory.
Suppose you are using a persistent allocator, i.e. one whose metadata is kept in persistent memory AND that is resilient to failures.
Now suppose your durable data structure (a queue for simplicity) has the following code:
1: newNode = (Node*)malloc(sizeof(newNode));
2: newNode->item = Z;
3: tail->next = newNode;
What happens if there is crash between lines 1 and 2?
The node has been allocated but not yet put on the queue. This means that it is permanently lost, i.e. leaked.
In volatile memory this isn't much of problem, if the application leaks memory, just restart it every once in a while and the leaks will disappear, but in PM that leaked node is now part of the durable (permanent) data and is there forever.
And when I say "forever" I mean until you throw away your data, or maybe you can figure out a way to export your data into a file, reset the PM and re-import your data, hopefully using an allocator that doesn't leak.
Generally speaking, in PM, leaks are there for all eternity (the lifetime of the data).
Having a crash is typically a rare event however, a multithreaded application may have multiple ongoing operations, one on each threads.
Theoretically, we could have each thread leaking one object during a crash, thus multiplying the number of leaks per crash by the number of threads in the apllication, in the worst-case scenario.
If we conside that each object is a node in a queue, then this isn't much to worry about.... chances are that crashes are rare and leaking a few hundred bytes or even KB per crash is not something to worry about.
What if the crashes aren't that rare, or what if the the allocations are for very large objects?
Then you start to sum up KB fast, or even MB, and this can have an impact in the total amount of memory.
If you had an application that leaked files to the disk every once in a while, would you find it acceptable?
The answer to this is likely to be "yes, if I can remove those files". This is exactly what happens when you go in Windows and delete temporary files created by your browser
The problem is that when using an allocator in PM, there is no simple way to know if the objects are still in use or not (referenced by other objects or one of the root pointers) and this means that a leak is not something easy to identify (just ask any C/C++ programmer).
Even tools like Address Sanitizer (ASAN) and Valgrind only identify the leaks when the (volatile) program ends. Doing it for PM is even harder.
Using a standalone allocator leaves the door open for leaks in the event of a crash. Is there a way around it?
Yes, use a PTM instead.
If you re-write the code such that the modifications are part of the PTM transaction along with the modifications on the allocator metadata, then these kind of leaks will no longer occur. The code will become something like:
1: beginTx();
2: newNode = (Node*)malloc(sizeof(newNode));
3: newNode->item = Z;
4: tail->next = newNode;
5: endTx();
Until endTx() commits, the transaction will not become visible and therefore, in the event of a crash, any modifications will be reverted (details depend on whether it's an undo-log or redo-log).
Even if there is a crash between lines 2 and 4, the PTM guarantees the transactional semantics, reverting all modifications to the allocator metadata and therefore reverting the allocation done in 2, safely and correctly.
This is what PTMs like Romulus and OneFile do :)
Ultimately, leaks in application code are likely to be the worst issue for long running data in PM and for that, there is no easy solution. There are application-side approaches to deal with this problem, but so far there is no really efficient generic solution.
Maybe one day someone will do a nice Garbage Collector for PM, but GCs aren't all the wide spread for C/C++ volatile applications, doing one for PM seems an even thougher challenge. And yes, the JVM has a GC, but the JVM and PM don't play along together all that well (yet).
No comments:
Post a Comment