Subtle bugs in Reader/Writer locks do not surprise me. I worked on an in-house i...

saclark11 · on March 4, 2024

Agreed. I avoid reader-writer locks unless absolutely required and benchmarks prove it worthwhile.

Their usage often fails to outperform a regular lock due to additional overhead. They seem to make sense only in specific high-contention scenarios where arrival rate is high and/or the critical section has a long duration [1].

[1]: https://petsta.net/blog/2022-09-30-rwmutex/ - Go specific, but I suspect these results hold true for most implementations of reader-writer locks.

tialaramex · on March 3, 2024

SRWLock seems like a situation where it's small enough for you to construct proofs that what you did is correct, and important enough that the enormous expense of such proof is justified.

The C++ std::mutex provided in Microsoft's STL is ludicrously big, whereas SRWLocks are the same size as a pointer. In one sense being ludicrously big is great - less risk the std::mutex will mistakenly share a cache line with your unrelated data, but for most people this makes std::mutex annoyingly expensive.

NeillC · on March 4, 2024

Maybe my memory is faulty but I believe it was analyzed by Leslie Lamports stuff. Ofcourse your building a model of how it should work and you might have faults in that.

Const-me · on March 4, 2024

C++ std::mutex provided in Microsoft's STL uses SRWLocks underneath, same size as a pointer.

Sadly, SRWLocks have yet another bug, they are very unfair. Write a loop which locks std::mutex inside the body, and no other thread will be able to grab the mutex despite the loop repeatedly releases then re-acquires the mutex. Critical sections are way more fair.

tialaramex · on March 4, 2024

> C++ std::mutex provided in Microsoft's STL uses SRWLocks underneath, same size as a pointer.

The SRWLocks are indeed the size of a pointer, the std::mutex is not for ABI reasons.

NeillC · on March 4, 2024

Critical sections are unfair also. Could be differences in spin count you are seeing.

Tringi · on March 4, 2024

Some time ago I did some test, 256 threads competing on a small number of cache lines, and found out that all, CreateMutex, CRITICAL_SECTION and SRWLOCK, were quite fair.

The most successful thread was only 25%, 15% and 9% ahead of the least successful one. On the contrary, in my simple usermode spinlock the unfairness would be 1000% or even 2000%.

convivialdingo · on March 3, 2024

I did some work with testing a cross platform in-house library for read-write locking.

We tested cmpxchg16b and found the performance was terrible with more than 4 cores.

Ended up using spin-locks similar to Linux kernel RCU.

userbinator · on March 3, 2024

It has been my experience that anything related to concurrency can be full of subtle edge cases. I tend to avoid it completely unless absolutely necessary.

rbmm · on March 4, 2024

several years ago i do own implementation of SRW/PushLocks - of course based on original NeillC code.. https://github.com/rbmm/SRW_ALT/tree/main/PushLock-ALT

they have slightly worse performance compared to MS, when high contention, but in test with this OP case - work well. in my test i not found bugs in implementation, but of course can not be sure that they not exist, very complex logic really. nobody test this, however very simply replace SRW calls to my implementation, by macros in h file

secondcoming · on March 3, 2024

If you have many readers and a single writer there's no point in the readers blocking each other.

AndrewStephens · on March 3, 2024

True, but in my experience the overhead of std::shared_mutex outweighs the benefits. Other approaches include:

* breaking up the lock so different threads can access different parts of your data structure concurrently.

* double-buffering (also called ping-pong buffers) where you effectively keep two copies of your data structure. The readers can access one without blocking, a single writer can modify the other and then swap.

* just accepting that reads will block each other with a std::mutex and work on minimizing the amount of time spent in the lock. This can actually work out quicker depending on your access patterns.

As always, careful profiling with real data is required to figure out what is better.

secondcoming · on March 3, 2024

The problem with double-buffering is that you still need to know when all the readers are no longer using one of the copies.

   T1: writer populates copy1
   T2: readers access copy1
   T3: writer populates copy2, and swaps
   T4: writer populates copy1, and swaps

At T4 a reader from T2 could still be accessing the old data structure.

Unless I'm overthinking it.

AndrewStephens · on March 4, 2024

No, you are right. One solution is that the readers access the buffer through a shared_ptr and can hold onto the old version for as long as they need it while the writer makes changes and creates a wholly new data structure. It is also possible for the writer to block new readers while doing the swap. Tradeoffs everywhere, depending on if you need readers to see changes that occurred after they started accessing the data.

to11mtm · on March 4, 2024

Nope, Ironically ran into this category of issue today with some buffer-reuse in a multi-threaded system.

SunlitCat · on March 3, 2024

Sad thing is, Reader / Writer locks are pretty tempting to use as they appear to be lightweight.