N producer threads push a 4-byte integer into one same queue, N consumer threads pop the integers from the queue. Each producer posts 1,000,000 messages. Total time taken to send and receive all these messages is measured. With SMT threads, the benchmark is run for from 1 producer and 1 consumer up to (total-number-of-cpus / 2) producers/consumers to measure the scalability of different queues. Without using SMT threads (cross-core communication only) -- up to (total-number-of-cpus / 4) producers/consumers. A benchmark run reports the best msg/sec throughput out of 10 tries for each queue. The charts report mean, stdev, min and max of msg/sec throughput across 33 benchmark runs.
One thread posts an integer to another thread through one queue and waits for a reply from another queue, 2 queues and 2 threads, in total. Each thread pings or pongs 500,000 messages into its egress queue. Contention is minimal here (1-producer-1-consumer, 1 element in the queue) in order to be able to achieve the lowest possible latency a queue can provide. This benchmark measures the total time taken to post 500,000 messages and receive 500,000 replies. A benchmark run reports the best sec/round-trip latency (time taken to push a message and pop its reply) out of 10 tries for each queue. The charts report mean, stdev, min and max of sec/round-trip latency across 33 benchmark runs.
github.com/max0x7ba/atomic_queue
Copyright (c) 2019 Maxim Egorushkin. MIT License. See the full licence in file LICENSE.