This breaks when your indices wrap around the integer limit. I'm not saying this can't be worked around but you end up with something much more complicated than just head==tail. It's a memory/runtime trade-off that I think is worth wasting one slot for.
If bufsize is a power of 2, doing the modulo would be exactly the same as just using a smaller integer type (like 8 bits on a microcontroller). What we want is the integer wraparound to never coincide with the buffer position wrapping. For that we would need a size that has some other factor than 2, and thus a "real" modulo operation in the indexing (which is slow).
Or is there an error in my thinking?
edit:
Duh, I understand now. If the counter has at least twice the range as the buffer size, the subtraction will always give the correct number of elements used. Power of 2 or not doesn't matter.
Not having to do the subtraction could still give a performance benefit, and if the buffer elements aren't very large, one wasted slot is worth the tradeoff.
Like for any algorithm choice, the optimal choice depends on the context.
On a desktop/laptop computer, wasting a queue slot matters very little, but it may be useful to achieve the maximum speed, so this method does not seem preferable.
On the other hand, in many microcontroller applications there is no other read/write RAM, but a few kilobytes of internal MCU RAM. So the amount of used RAM may be the most important resource and there may be many FIFO queues for various peripheral interfaces.
For such MCU applications, it is valuable to be aware of this alternative FIFO queue implementation, as it may be the best choice.
It took me a bit to understand how tcp seqnums work in case of wraparound, but the trick is exactly that: everything works fine as long as you have less than 2*31 unack'd bytes.
Right, you resolve the ambiguity of the wraparound because we know that logically the read_idx (which I guess is better called pop_idx) can never exceed the write_idx. It does fail if bufsize equals 2^w where w is the width of your index type, but eh, that's fair enough. You still 'waste' a slot but it's in the address space rather than physically.