Sunday, January 11, 2015

Ticket Lock AWN - Negative Egress variant

On this post we are going to cover the second of the three variants of Ticket AWN. We call this variant the Negative Egress because the thread doing the unlock() uses the sign of the egress to tell the next waiting thread whether it saw a waiting node or not.
Here's what the lock() and unlock() methods look like in this variant:

void ticket_awnne_mutex_lock(ticket_awnne_mutex_t * self)


    const long long ticket = atomic_fetch_add(&self->ingress, 1);
    if (atomic_load(&self->egress) == ticket) return;
    while (get_pos_egress_relaxed(self) >= ticket-1) {
        if (get_pos_egress(self) == ticket) return;
    // If there is no slot to wait, spin until there is
    while ((ticket-get_pos_egress(self)) >= (self->maxArrayWaiters-1)) sched_yield();

    // There is a spot for us on the array, so place our node there
    awnne_node_t * wnode = &tlNode;
    // Reset lockIsMine from previous usages
    atomic_store_explicit(&wnode->lockIsMine, false, memory_order_relaxed);
    atomic_store(&self->waitersArray[(int)(ticket % self->maxArrayWaiters)], wnode);

    // If there is only one left before egress becomes our ticket, wait for it
    while (true) {
        const long long localEgress = atomic_load(&self->egress);
        // Break to spin on lockIsMine
        if (get_pos_egress_relaxed(self) < ticket-1 || -localEgress == ticket) break;
        if (localEgress == ticket) return; // egress was positive and matches. Lock acquired
    // Spin on our own cache line waiting for the lock
    while (!atomic_load(&wnode->lockIsMine)) {

void ticket_awnne_mutex_unlock(ticket_awnne_mutex_t * self)
    long long ticket = get_pos_egress_relaxed(self);
    // Clear up our entry in the array before releasing the lock.
    atomic_store_explicit(&self->waitersArray[(int)(ticket % self->maxArrayWaiters)], NULL, memory_order_relaxed);
    // We could do this load as relaxed per se but then the store on egress
    // of -(ticket+1) could be re-ordered to be before, and we don't want that
    awnne_node_t * wnode = atomic_load(&self->waitersArray[(int)((ticket+1) % self->maxArrayWaiters)]);
    if (wnode != NULL) {
        // We saw the node in waitersArray, so tell the thread to spin
        // on lockIsMine by setting a negative egress
        atomic_store_explicit(&self->egress, -(ticket+1), memory_order_relaxed);
        atomic_store(&wnode->lockIsMine, true);
    } else {
        // No node was seen, so set a positive egress
        atomic_store(&self->egress, ticket+1);

In the "Negative Egress" variant we use one bit of the egress for the current thread to indicate to the next waiting thread that it has seen its waiting node, and as such, the next waiting thread needs only to spin on the wnode->lockIsMine (and not on egress). In our implementation we use the sign bit of the integer for this purpose, but other techniques can be used, like for example, the least significant bit (lsb) and then do the getAndAdd() in steps of 2.
When a waiting thread sees its ticket with a negative value it can not go into its critical section because the current thread may still be about to increment egress and then there could be multiple threads incrementing egress simultaneously, which could leave it in a bad state, unless we would use getAndIncrement()/atomic_fetch_add() at the end of unlock() but that is a more expensive operation than a store with release, so we want to make sure we do it with a store with release.
This variant is slightly "tricky" because on the unlock() we can do a store on egress and then on lockIsMine or a store just on egress, and both code paths must be "correct".

Sample Scenario

To understand how this mechanism works, imagine a sample scenario where Thread 1 (T1) gets a ticket of 10:
  • egress is 10: T1 has the lock
  • egress is  9: T1 will spin on egress waiting for it to become 10
  • egress is  8: T1 will add its node to the waitersArray and check egress again:
    • egress is still     8: T1 will spin on lockIsMine 
    • egress is now  -9: T1 will spin on egress (egress may be about to change to 10)
    • egress is now    9: T1 will spin on egress (egress may be about to change to 10)
    • egress is now -10: T1 will spin on lockIsMine (previous thread has seen T1's node)
    • egress is now   10: T1 has the lock

Relaxed Atomics

There are a few places in this algorithm where we use relaxed atomic loads or stores in the C11/C++1x implementation:
  • In lock(), the store on wnode->lockIsMine to false can be relaxed because it will become visible on the release-store of the wnode in the array and it will only be accessible from that instant in time.
  • In unlock(), the first load on egress can be relaxed because it was read last by the same thread in lock() so it is guaranteed to be up to date.
  • In unlock(), the store on the self entry of the array to NULL can be relaxed because it will become visible either with the store with release on the wnode->lockIsMine to true or on the egress to ticket+1.
  • In unlock(), the store in egress of -(ticket+1) can be relaxed because before there is a load with acquire of the array which prevents reordering above, and the store will become visible when we do the store with release of wnode->lockIsMine to true.

Here is the source code in Java and C11:

No comments:

Post a Comment