Friday, September 20, 2019

Can I use a volatile allocator in Persistent Memory?

Spoiler alert, the answer is NO!

In this post I'm going to try to explain why "allocators" are such a vital part of using persistent memory. And when I talk about persistent memory (PM) I'm talking about NV-DIMMs like Intel Optane DC Persistent Memory.

When we run an application (in volatile memory) the allocator is just a passing thought. If the application does a lot of heap allocation/de-allocation, then having a fast allocator is important, but performance is about the only consideration. The truth is that the stock allocator that comes with windows/linux/iOS is fine for the majority of applications. For the exceptions, there are things like jemalloc.

When we run an application in persistent memory (PM) then it's a whole different issue!
Applications running in PM need to be failure-resilient. This means they have to recover correctly after a crash (think databases and filesystems). This requirement also applies to the allocator, and volatile allocators do not provide failure-resilience.

Suppose you have a durable queue in PM and you allocate a block for a new node. You allocate a node, place item A inside the node and them insert the node in the queue. Some time later you allocate a new node for item B and place the node inside the queue. Afterward, you allocate a new node, place item C inside the node and just before you complete the linking of the new node, the system crashes!

Upon restart, you call the recovery() method for the durable queue to recover any inconsistency in the queue, which in this case may decide to append node C or disregard it (dependig on specific of the queue algorithm), but nodes A and B will for sure be in the queue, assuming this queue has a consistentcy equivalent to "durable linearizability" (that's what I mean when I talk about a "durable queue").

Notice that the allocator metadata has been reset.

Ok, so now you have a consistent queue with items A and B.

What happens if you were using a volatile allocator?
Well, all the metadata of the allocator will disappear during the crash, which means that when you restart the application, the allocator will be "reset".
Everything will be fine, until the first allocation you do, at which point the allocator will decide to give you a block which had already given you before.
For example, if you then decide to continue the operation and insert item C, when you request a new node, the allocator will most likely give you the same block of memory that gave you the first time, where you stored item A. This is in effect a "corruption" because now the application is going to place item C into this node, overwriting the contents of the old node (already in the queue) and making A disappear.
Your queue is now "head -> node C -> node B -> tail"

Even though you're using a durable queue, you've just corrupted your data because you decided to use a volatile allocator.

There are however many research papers that take this approach and in my view this is non-sensical.
I understand that they want to show their algorithms' performance irrespective of the shortcomings of whatever allocator they're using, but the thing is, an algorithm for a durable data structure can not be correctly deployed without a persistent allocator.
When we use a PTM (Persistent Transactional Memory), the allocation is done inside the transaction along with all the modifications to any user data or data structure. Because all of this is done durably with all-or-nothing semantics (and ACID transaction) PTMs have no such corruption issues.

The gist of it is that, to have a "correct" durable data structure you need to use a "correct" durable allocator.
If you're using a PTM, then you get both of these together and when you measure the performance you're measruing the performance of both the data structure and the allocator.
If you're using a standalone durable data structure then you need to add a durable allocator and account for it as well, because one day, when the user deploys that data structure he/she will have to do it with a durable allocator which means that that durable allocator has to be taken into account when doing any kind of benchmark.
What's the purpose of having a super-duper-fast durable data structure that's not correct?!?After all, users don't typically want to use durable data structures that get corrupted after a restart  ;)

This kind of corrupting issue when using a volatile allocator is just the tip of the iceberg.

Next, comes leakage.
Let's talk about it next week.

PS: Both Romulus and OneFile deal with these kind of issues correctly. Any data structure you place within these PTMs will have correct recovery after a crash and this includes the allocations/de-allocations.