Does anyone know what "Mutex lock/unlock" is actually measuring?
In Linux at least, a mutex is a plain old struct living in main memory that you "lock" and "unlock" by cmpxchg'ing some fields. It's literally a main memory write, I don't understand how it could have a quarter of the latency of main memory.
Everything not in the registers is a main memory write if you ignore that cache sits inbetween. "All" the CPU needs to do for a mutex is some atomics to verify the core can lock it, it doesn't need to wait for that info to propagate back to the physical RAM stick the same as how anything else in cache doesn't need to wait for it.
In Linux at least, a mutex is a plain old struct living in main memory that you "lock" and "unlock" by cmpxchg'ing some fields. It's literally a main memory write, I don't understand how it could have a quarter of the latency of main memory.