Latency Benchmark

Two threads ping-pong one 4-byte integer counter via two queue objects. The counter is decremented and sent as a reply, until reaching 0. There are 2 queues and 2 threads, the counter is initialized with 1,000,000 (each of the two threads pushes/pops 500,000 messages).

Contention is minimal here: each queue has 1 producer 1 consumer, up to 1 element in the queue -- the best case ideal scenario for a queue to demonstrate its lowest possible latency. The dependency chains on popped counter exist to prevent CPUs from pipe-lining out-of-order execution in order to measure the true round-trip latency.

This benchmark measures the total time taken for the 2 threads to exchange the 1,000,000 messages. The charts report mean, stdev, min and max of sec/round-trip latency across 33 benchmark runs.

Latency (with SMT) on AMD Ryzen 5950X

Cross-core (no SMT) latency on AMD Ryzen 5950X

Latency (with SMT) on AMD Ryzen 5825U

Cross-core (no SMT) latency on AMD Ryzen 5825U

Legacy latency (with SMT) on Intel Xeon Gold 6132

Legacy latency (with SMT) on Intel i9-9900KS

Throughput Benchmark

N producer threads push a 4-byte integer into one same queue, N consumer threads pop the integers from the queue. With SMT threads, the benchmark is run for from 1 producer and 1 consumer up to (total-number-of-cpus / 2) producers/consumers to measure the scalability of different queues. Without using SMT threads (cross-core communication only) -- up to (total-number-of-cpus / 4) producers/consumers.

There are no dependency chains on messages in producer and consumer threads in this benchmark in order to let the queues demonstrate their highest possible throughputs. The reported throughputs are higher than the inverse of latencies because of no dependency chains stalling CPU pipe-lining and out-of-order execution (as intended).

This benchmark measures the total time taken to send and receive a total of 1,000,000 messages through one queue. The charts report mean, stdev, min and max of msg/sec throughput across 33 benchmark runs.

Scalability (with SMT) on AMD Ryzen 5950X

Cross-core (no SMT) scalability on AMD Ryzen 5950X

Scalability (with SMT) on AMD Ryzen 5825U

Cross-core (no SMT) scalability on AMD Ryzen 5825U

Legacy scalability (with SMT) on Intel Xeon Gold 6132

Legacy scalability (with SMT) on Intel i9-9900KS

Systems details

AMD Ryzen 5950X system

AMD Ryzen 5825U system

Intel Xeon Gold 6132 system

Intel i9-9900KS system

Homepage

github.com/max0x7ba/atomic_queue