Why is having head and tail in the same cache line bad for performance? Given that the code reads both head and tail to check if the queue is empty, being in the same cache line seems optimal.
In rough terms, every push or pop from the queue also causes head or tail to be written, thus dirtying a line of cache. That’s the bit I misunderstood.
To be clear, as mentioned in another thread [0] there is no need to ensure consistency between head and tail for a SPSC so storing them in different cache lines would be much better.
For the multiple-consumer variant, however, consistency between head and tail is required so packing them in the same atomic is the simplest solution.