I don't think the headline really captures the optimization in play and the blog doesn't cover the why it works either.
TLS is just part of the implementation details, the real thing that is being exploited is that cache lines that are in a shared state are basically free to read from multiple threads. Cache lines that are in the exclusive state are basically free to write or even CAS from a single thread.
That means that if you have a pointer to an immutable piece of data that is read more often than it is written you can exploit the cache coherence subsystem to avoid communicating across cores, at least until the next write.
The cleanup step sounds similar to hazard pointers. I am guessing that since the memtable is a skiplist the contention there (such as in the allocator) is managed some other way and that the global version is just pointers to memtables and sstables.
TLS is just part of the implementation details, the real thing that is being exploited is that cache lines that are in a shared state are basically free to read from multiple threads. Cache lines that are in the exclusive state are basically free to write or even CAS from a single thread.
That means that if you have a pointer to an immutable piece of data that is read more often than it is written you can exploit the cache coherence subsystem to avoid communicating across cores, at least until the next write.
The cleanup step sounds similar to hazard pointers. I am guessing that since the memtable is a skiplist the contention there (such as in the allocator) is managed some other way and that the global version is just pointers to memtables and sstables.