Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Subtle bugs in Reader/Writer locks do not surprise me. I worked on an in-house implementation based on Win32 (before C++11 and std::shared_mutex) and my recollection is that although the implementation sounds simple it is exceedingly easy to make subtle mistakes.

The experience left me with such a bad feeling for shared locks that I tend to avoid them unless absolutely required. When I last tested std::shared_mutex, performance was so poor compared to std::mutex that double-buffering the data protected by a simple mutex was much faster.

This was a great post by the original Redditor.



Agreed. I avoid reader-writer locks unless absolutely required and benchmarks prove it worthwhile.

Their usage often fails to outperform a regular lock due to additional overhead. They seem to make sense only in specific high-contention scenarios where arrival rate is high and/or the critical section has a long duration [1].

[1]: https://petsta.net/blog/2022-09-30-rwmutex/ - Go specific, but I suspect these results hold true for most implementations of reader-writer locks.


SRWLock seems like a situation where it's small enough for you to construct proofs that what you did is correct, and important enough that the enormous expense of such proof is justified.

The C++ std::mutex provided in Microsoft's STL is ludicrously big, whereas SRWLocks are the same size as a pointer. In one sense being ludicrously big is great - less risk the std::mutex will mistakenly share a cache line with your unrelated data, but for most people this makes std::mutex annoyingly expensive.


Maybe my memory is faulty but I believe it was analyzed by Leslie Lamports stuff. Ofcourse your building a model of how it should work and you might have faults in that.


C++ std::mutex provided in Microsoft's STL uses SRWLocks underneath, same size as a pointer.

Sadly, SRWLocks have yet another bug, they are very unfair. Write a loop which locks std::mutex inside the body, and no other thread will be able to grab the mutex despite the loop repeatedly releases then re-acquires the mutex. Critical sections are way more fair.


> C++ std::mutex provided in Microsoft's STL uses SRWLocks underneath, same size as a pointer.

The SRWLocks are indeed the size of a pointer, the std::mutex is not for ABI reasons.


Critical sections are unfair also. Could be differences in spin count you are seeing.


Some time ago I did some test, 256 threads competing on a small number of cache lines, and found out that all, CreateMutex, CRITICAL_SECTION and SRWLOCK, were quite fair.

The most successful thread was only 25%, 15% and 9% ahead of the least successful one. On the contrary, in my simple usermode spinlock the unfairness would be 1000% or even 2000%.


I did some work with testing a cross platform in-house library for read-write locking.

We tested cmpxchg16b and found the performance was terrible with more than 4 cores.

Ended up using spin-locks similar to Linux kernel RCU.


It has been my experience that anything related to concurrency can be full of subtle edge cases. I tend to avoid it completely unless absolutely necessary.


several years ago i do own implementation of SRW/PushLocks - of course based on original NeillC code.. https://github.com/rbmm/SRW_ALT/tree/main/PushLock-ALT

they have slightly worse performance compared to MS, when high contention, but in test with this OP case - work well. in my test i not found bugs in implementation, but of course can not be sure that they not exist, very complex logic really. nobody test this, however very simply replace SRW calls to my implementation, by macros in h file


If you have many readers and a single writer there's no point in the readers blocking each other.


True, but in my experience the overhead of std::shared_mutex outweighs the benefits. Other approaches include:

* breaking up the lock so different threads can access different parts of your data structure concurrently.

* double-buffering (also called ping-pong buffers) where you effectively keep two copies of your data structure. The readers can access one without blocking, a single writer can modify the other and then swap.

* just accepting that reads will block each other with a std::mutex and work on minimizing the amount of time spent in the lock. This can actually work out quicker depending on your access patterns.

As always, careful profiling with real data is required to figure out what is better.


The problem with double-buffering is that you still need to know when all the readers are no longer using one of the copies.

   T1: writer populates copy1
   T2: readers access copy1
   T3: writer populates copy2, and swaps
   T4: writer populates copy1, and swaps
At T4 a reader from T2 could still be accessing the old data structure.

Unless I'm overthinking it.


No, you are right. One solution is that the readers access the buffer through a shared_ptr and can hold onto the old version for as long as they need it while the writer makes changes and creates a wholly new data structure. It is also possible for the writer to block new readers while doing the swap. Tradeoffs everywhere, depending on if you need readers to see changes that occurred after they started accessing the data.


Nope, Ironically ran into this category of issue today with some buffer-reuse in a multi-threaded system.


Sad thing is, Reader / Writer locks are pretty tempting to use as they appear to be lightweight.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: