Why do so many people use the head==tail as full condition? You can use the enti...

dundarious · on May 15, 2022

Fabian Giesen's always excellent blog has a post about various ring buffer implementations: https://fgiesen.wordpress.com/2010/12/14/ring-buffers-and-qu...

He has a companion post about a "magic" ring buffer that uses a virtual memory mapping to make a ring buffer where you can always address the entire sizes as contiguous memory, but I think it has a bit of minor bit rot in the linked implementations. https://fgiesen.wordpress.com/2012/07/21/the-magic-ring-buff...

adrian_b · on May 15, 2022

This method is fine, but with it you must always take care to choose bufsize as a power of two, otherwise the overhead of having to do integer divisions for all queue operations may be noticeable.

arinlen · on May 15, 2022

> but with it you must always take care to choose bufsize as a power of two, otherwise the overhead of having to do integer divisions

Keep in mind that one of the reasons behind the arbitrary "power of two" requirement is that the method proposed by the author abuses a single atomic integer to hold two shorts representing the indices for the head and tail elements.

I don't think it makes much sense to talk about overhead given this method requires doing a bunch of bitmasks to operate over the high and low short.

adrian_b · on May 15, 2022

The overhead due to bit masks and rotations/shifts is negligible on almost all CPUs.

Larger CPUs can do many such operations during every clock cycle and even in very small CPUs they are usually single-cycle operations.

Depending on the CPU, integer divisions are 10 to 100 times slower, so in a completely different range of overhead.

orlp · on May 15, 2022

This breaks when your indices wrap around the integer limit. I'm not saying this can't be worked around but you end up with something much more complicated than just head==tail. It's a memory/runtime trade-off that I think is worth wasting one slot for.

mh7 · on May 15, 2022

No, nothing breaks with unsigned integers.

Modular arithmetic makes it work. Try a few examples close to UINT_MAX if you want to convince yourself.

stingraycharles · on May 15, 2022

It does imply that UINT_MAX needs to be divisible by “bufsize” if I’m not mistaken, right? So that the modulo boundaries align with the overflow.

properparity · on May 15, 2022

Not necessarily divisible, but requires power of 2 bufsize with this code (but you probably want that anyway to avoid division).

With non-power-2 bufsize you need some extra logic for write/readix++, but the rest of the code can stay the same.

rep_lodsb · on May 15, 2022

That doesn't seem right...

If bufsize is a power of 2, doing the modulo would be exactly the same as just using a smaller integer type (like 8 bits on a microcontroller). What we want is the integer wraparound to never coincide with the buffer position wrapping. For that we would need a size that has some other factor than 2, and thus a "real" modulo operation in the indexing (which is slow).

Or is there an error in my thinking?

edit: Duh, I understand now. If the counter has at least twice the range as the buffer size, the subtraction will always give the correct number of elements used. Power of 2 or not doesn't matter.

Not having to do the subtraction could still give a performance benefit, and if the buffer elements aren't very large, one wasted slot is worth the tradeoff.

adrian_b · on May 15, 2022

Like for any algorithm choice, the optimal choice depends on the context.

On a desktop/laptop computer, wasting a queue slot matters very little, but it may be useful to achieve the maximum speed, so this method does not seem preferable.

On the other hand, in many microcontroller applications there is no other read/write RAM, but a few kilobytes of internal MCU RAM. So the amount of used RAM may be the most important resource and there may be many FIFO queues for various peripheral interfaces.

For such MCU applications, it is valuable to be aware of this alternative FIFO queue implementation, as it may be the best choice.

gpderetta · on May 15, 2022

It took me a bit to understand how tcp seqnums work in case of wraparound, but the trick is exactly that: everything works fine as long as you have less than 2*31 unack'd bytes.

Ring buffers work exactly the same way

orlp · on May 17, 2022

Right, you resolve the ambiguity of the wraparound because we know that logically the read_idx (which I guess is better called pop_idx) can never exceed the write_idx. It does fail if bufsize equals 2^w where w is the width of your index type, but eh, that's fair enough. You still 'waste' a slot but it's in the address space rather than physically.

JonChesterfield · on May 15, 2022

Easier to sanity check if you use uint8 for the indices

kevin_thibedeau · on May 15, 2022

This creates an ambiguity for testing the full and empty conditions. You end up having to store an additional atomic flag to track that state.