Saturday, May 18, 2019

OneFile and wait-free memory reclamation

Last week we talked about OneFile and how it is capable of doing wait-free allocation and de-allocation. This week we'll give a brief intro on how it is capable of doing wait-free memory reclamation.
This isn't the first wait-free memory reclamation scheme, I know of at least one more by Maged Michael based on LL/SC, but OneFile is the first one that is generic in the sense that it works for any data structure (although it requires DCAS). One thing's for sure, the optimistic technique we use in OneFile PTM is the first ever lock-free/wait-free memory reclamation technique shown for Persistent Memory (PM).

Let's start with the volatile memory first.
In a multi-threaded application, pointers to dynamically allocated objects (in the heap) are typically shared among multiple threads. In a lock-free data structure, because another thread may still de-reference a pointer to an object, we can not immediately delete/free the object when we no longer need it.
Instead, we need to use a memory reclamation technique like Pass The Buck, RCU, Hazard Pointers, or Hazard Eras.

In OneFile, user objects are allocated using tmNew() and de-allocated using tmDelete(). Internally, tmNew() calls 'new' and creates a new object that is tracked using Hazard Eras. When tmDelete() is called by the user, the object is placed in a retire list until it is safe to delete.
This technique is not new, Hazard Pointers and pretty much every pointer-based memory reclamation mechanism work like this. The innovation is that because OneFile is an STM, every access to a word of an object in the STM has to go through a load interposing method (we named pload()) and in this pload(), if we see that the sequence associated with the word we're trying to read has been modified and is more recent than the timestamp (curTx) of when we started the transaction, then it might be an inconsistent read, and we abort the transaction, never returning the (possibly incorrect) value at the memory address.
Explained another way, the pload() will never return a value that may be inconsistent (will throw/catch an exception and abort the current transaction) because that's part of how the STM works. A result of this, is that the memory reclamation scheme will never de-reference an incorrect pointer.

The really cool thing about this is that it was _not_ possible to make this option practical until the invention of Hazard Eras. Hazard Eras is the first atomics-only and lock-free memory reclamation scheme that uses epochs instead of pointers. Because of this, it means that a single era can protect all objects alive in a given time instant.
Doing something similar with a pointer-based scheme like Hazard Pointers would require having one published pointer per object traversed. First difficulty, we would have to know to which object a given address belongs to, a highly non-trivial problem in word-based STMs.
Second difficulty, and way more serious, each de-referenced pointer (node traversed in a data structure) needs to be protected first, which on Hazard Pointers means publishing that pointer and doing an mfence (on x86). It also means that each pointer has to have a unique publishing. Example: if we want to traverse a linked list with 1000 nodes, we need to publish 1000 distinct pointers, which other threads will have to scan before they de-allocate a single object. Clearly the memory usage cost of doing 1000 mfences would drown out any chance of making this a suitable approach, plus the fact that to delete an object we have to scan 1000 memory locations to check for a match.
_This_ is the reason why, until now, there has been no lock-free STM with lock-free memory reclamation!
Thanks to Hazard Eras, a single era protects all the objects that the transaction needs to touch, until the transaction commits or aborts.
In a certain sense, OneFile is in a show-case for the power of Hazard Eras.

But it doesn't stop there. Let's take a look at the algorithm for Persistent Memory (PM).
In the wait-free PM variant of OneFile we use the same Hazard Eras algorithm described above to track the std::function objects where the transactions are stored (the lock-free variant doesn't share transactions among threads), but the user allocated objects in PM are handled differently.
For the dynamic objects allocated by the user, we rely on an optimistic technique. Access to a de-allocated object in the mmap()ed region in PM does not trigger a segmentation fault (the object was within the mmaped region) and therefore doesn't have the constraints that using the system allocator has for the volatile heap.
However, it does have the issue that we can _never_ reset the sequence on a tmtype<T>. This sequence must be ever-increasing (takes about 100k years to wrap, so it should be fine), changing with every modification pertaining to a transaction.
The proof is in the paper but it makes the memory reclamation scheme become almost trivial, in the sense that the implementation of the memory reclamation scheme is non-existent, or so it may seem when you look at the code.
Appearances are deceiving and although the no-resetting-sequence-of-a-tmtype being a constraint carefully handled throughout the code, the reason _why_ this is enough is super hard.
As Andreia always says, we make things that look so easy, that people sometimes mistake them for being trivial. Believe me, it's not, and it was not trivial to get there. Andreia, Nachshon, Pascal and I had to work hard to get there.

Until today, there has been no lock-free (much less wait-free) memory reclamation scheme presented for PM, and certainly none that is as generic or as fast as the one we use in OneFile PTM. The fact that we don't have to keep a retired/zombie list in PM is a major breakthrough because persisting such a list has a high cost in number of clwb/clflushopt/sfence. Our optimistic reclamation scheme doesn't have any zombie list: the destructor is immediately called during the transaction (and this is safe even if the transaction later aborts because those changes will themselves be reverted) and the object is immediately de-allocated, all from within the transaction. This guarantees there are no leaks and no allocator metadata corruption in the event of a failure.

Hope you enjoy OneFile.

That's the end of this series for now. I'll come back when we have more cool stuff to talk about.

No comments:

Post a Comment