Sunday, January 18, 2015

Benchmarks for Ticket Lock AWN in C11

We ran our microbenchmark for the C11 implementations of the Ticket Lock AWN (all three variants), compared them against a basic Ticket Lock, and here are the results.

-----------------------------------------------------
AMD Opteron (2 NUMA nodes, total of 32 cores)

1 Thread
pthread_mutex_t                  5291403 ops/second
Ticket Lock                     17275652 ops/second
Ticket AWN - Negative Egress    15946079 ops/second
Ticket AWN - Ends on Egress     15876117 ops/second
Ticket AWN - Spins on Both      17062574 ops/second

2 Threads
pthread_mutex_t                  1575780 ops/second
Ticket Lock                      9078627 ops/second
Ticket AWN - Negative Egress     8065764 ops/second
Ticket AWN - Ends on Egress      7946022 ops/second
Ticket AWN - Spins on Both       8693872 ops/second

4 Threads
pthread_mutex_t                  1313133 ops/second
Ticket Lock                      7383215 ops/second
Ticket AWN - Negative Egress     6463225 ops/second
Ticket AWN - Ends on Egress      6712847 ops/second
Ticket AWN - Spins on Both       6676756 ops/second

8 Threads
pthread_mutex_t                  1377882 ops/second
Ticket Lock                      7335856 ops/second
Ticket AWN - Negative Egress     6542371 ops/second
Ticket AWN - Ends on Egress      7233964 ops/second
Ticket AWN - Spins on Both       7207002 ops/second

16 Threads
pthread_mutex_t                  1302257 ops/second
Ticket Lock                      4915970 ops/second
Ticket AWN - Negative Egress     5810103 ops/second
Ticket AWN - Ends on Egress      5934175 ops/second
Ticket AWN - Spins on Both       6204561 ops/second

32 Threads
pthread_mutex_t                  1299800 ops/second
Ticket Lock                      3925222 ops/second
Ticket AWN - Negative Egress     5185931 ops/second
Ticket AWN - Ends on Egress      5550168 ops/second
Ticket AWN - Spins on Both       5755218 ops/second


-----------------------------------------------------
Intel i7-3740QM (4 cores, 8 hyper threads)

1 Thread
pthread_mutex_t                 13931684 ops/second
Ticket Lock                     37615670 ops/second
Ticket AWN - Negative Egress    34421877 ops/second
Ticket AWN - Ends on Egress     34367820 ops/second
Ticket AWN - Spins on Both      38167859 ops/second

2 Threads
pthread_mutex_t                  4169864 ops/second
Ticket Lock                     20957181 ops/second
Ticket AWN - Negative Egress    19841579 ops/second
Ticket AWN - Ends on Egress     19541868 ops/second
Ticket AWN - Spins on Both      24479835 ops/second

4 Threads
pthread_mutex_t                  4215837 ops/second
Ticket Lock                     20065448 ops/second
Ticket AWN - Negative Egress    17473789 ops/second
Ticket AWN - Ends on Egress     17397894 ops/second
Ticket AWN - Spins on Both      22530033 ops/second


After a quick look at the numbers, we see that the Spins on Both variant is pretty much the winner of the three variants, and that when compared with the basic Ticket Lock, it follows closely and sometimes can vastly surpass it (for the scenario with 32 threads in the AMD machine).



When the Ticket AWN variants are faster than the basic ticket Lock, it seems to be for most of the same reasons that the CLH and MCS locks are better than a Ticket Lock: the waiting threads are each spinning on their own cache line, which reduces traffic in the cache-coherence system.
Moreover, in the Ticket AWN variants each node belongs to a thread and is used (and re-used) by that thread alone. This means that there is no node "passing" from thread to thread like there is on CLH, which means that the tlNode thread-local variable is a pointer whose cache-line is never touched, i.e. it is always pointing to the same object in memory (unlike CLH). This represents a (small) advantage of the Ticket AWN when compared to the CLH.

Something to keep in mind is that on our benchmark each thread tries to immediately acquire the lock after it releases it, which means that with N threads there are always N-1 or N-2 threads waiting. The Ticket AWN algorithms are more favorable in situations where there are at least 1 other thread already waiting, and this is is not very visible in the results, but we should keep in mind that a more realistic scenario where threads try to acquire the lock in "bursts" could yield somewhat different results.

For the moment, that's all we've got on Ticket Locks, but if you're interested in Ticket Lock variants, here is another variant by Dave Dice you can take a look at:
https://blogs.oracle.com/dave/entry/partitioned_ticket_lock1

No comments:

Post a Comment