Wednesday, January 7, 2015

Ticket Lock AWN - Ends on Egress variant

Following on the previous post, today we're going to cover the first of the three Ticket AWN variants. We call this variant the Ends on Egress because we always spin (for a little while) on egress before acquiring the lock, even if we spun first on lockIsMine.
Here's what the lock() and unlock() methods look like in this variant:



void ticket_awnee_mutex_lock(ticket_awnee_mutex_t * self)
{
    const long long ticket = atomic_fetch_add(&self->ingress, 1);
    if (atomic_load(&self->egress) == ticket) return;
    while (atomic_load_explicit(&self->egress, memory_order_relaxed) >= ticket-1) {
        if (atomic_load(&self->egress) == ticket) return;
        sched_yield();
    }
    // If there is no slot to wait, spin until there is
    while (ticket-atomic_load(&self->egress) >= (self->maxArrayWaiters-1)) sched_yield();

    // There is a spot for us on the array, so place our node there
    awnee_node_t * wnode = &tlNode;
    // Reset lockIsMine from previous usages
    atomic_store_explicit(&wnode->lockIsMine, false, memory_order_relaxed);
    atomic_store(&self->waitersArray[(int)(ticket % self->maxArrayWaiters)], wnode);

    if (atomic_load(&self->egress) < ticket-1) {
        // Spin on lockIsMine
        while (!atomic_load(&wnode->lockIsMine)) sched_yield();
    }
    // Spin on egress
    while (atomic_load(&self->egress) != ticket) {
        sched_yield();
    }
}




void ticket_awnee_mutex_unlock(ticket_awnee_mutex_t * self)
{
    long long ticket = atomic_load_explicit(&self->egress, memory_order_relaxed);
    // Clear up our entry in the array before releasing the lock.
    atomic_store_explicit(&self->waitersArray[(int)(ticket % self->maxArrayWaiters)], NULL, memory_order_relaxed);
    // We could do this load as relaxed per se but then the store on egress of -(ticket+1) could be re-ordered to be before, and we don't want that
    awnee_node_t * wnode = atomic_load(&self->waitersArray[(int)((ticket+1) % self->maxArrayWaiters)]);
    if (wnode != NULL) {
        // We saw the node in waitersArray
        atomic_store_explicit(&wnode->lockIsMine, true, memory_order_relaxed);
    }
    atomic_store(&self->egress, ticket+1);
}


In the "Ends on Egress" variant, in the lock() method, after placing the node on the waitersArray we look again at egress and if there is at least one other thread waiting in front of the current one then we spin on lockIsMine because we know for sure that that next thread will see the node we've just placed in the array.
We could return from the lock() method as soon as lockIsMine becomes true, but then we could have (bad) races on egress and overwrite it. To solve that problem, we could do an atomic increment on egress at the end of unlock() instead of doing a store with release, but atomic increments are more expensive than stores with release, which would mean that on a single-thread case this variant would lose out against a simple Ticket Lock. The design goal of the AWN algorithm is to be at least as good as a Ticket Lock, so we needed to find a way to increment egress with just a store.
The solution is simple, it involves spinning on egress one last time before returning from lock(). This way, we are sure that the last instruction of the previous thread calling unlock() (setting egress to its ticket+1) has completed by the time we enter the critical section, which means that the current thread can safely call unlock() without the risk of having multiple threads writing to egress simultaneously.


Sample Scenario

To understand how the egress mechanism works, imagine a sample scenario where Thread 1 (T1) gets a ticket of 10:
  • egress is 10: T1 has the lock
  • egress is  9: T1 will spin on egress waiting for it to become 10
  • egress is  8: T1 will add its node to the waitersArray, then T1 will spin on lockIsMine until it is set to true and after will spin on egress waiting for it to become 10

Relaxed Atomics

There are a few places in this algorithm where we use relaxed atomic loads or stores in the C11/C++1x implementation:
  • In lock(), the store on wnode->lockIsMine to false can be relaxed because it will become visible on the release-store of the wnode in the array and it will only be accessible from that instant in time.
  • In unlock(), the first load on egress can be relaxed because it was read last by the same thread in lock() so it is guaranteed to be up to date.
  • In lock(), the store on the self entry of the array to NULL can be relaxed because it will become visible either with the store with release on the wnode->lockIsMine to true or on the egress to ticket+1.
  • In unlock(), the store in lockIsMine to true can be relaxed because before it there is a load with with acquire of the array which prevents reordering above, and the store will become visible when we do the store with release of egress to ticket+1

Something to notice on the unlock() in the "Ends on Egress" variant, the store on lockIsMine will become visible to other threads right before the store to egress, which means that the next waiting thread that is currently spinning on lockIsMine will break out of the loop and check once (and only once) the egress to see that it is its turn to acquire the lock. This means that if the previous thread has seen the current thread's waiting node, the last loop in the method lock() will be executed for a single iteration, thus incurring at most the price of a cache-miss on egress.

Here is the source code in Java and C11:
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Java/com/concurrencyfreaks/locks/experimental/TicketAWNEndsEgressLock.java

https://github.com/pramalhe/ConcurrencyFreaks/blob/master/C11/locks/ticketawn/ticket_awnee_mutex.h
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/C11/locks/ticketawn/ticket_awnee_mutex.c

No comments:

Post a Comment