> CAS don't do cache flushes. Caches are fully coherent
Not in the sense of flushing from cache to memory, but yes in the sense of data from a local core cache needing to be sent to another core.
Caches may be coherent but coherency is definitely not free, especially on multi-processor systems, which you probably care about if you are writing this kind of code.
CAS has no effect on the cache (other than guaranteeing that the cacheline to be in the exclusive state for the time to perform the load+op+store). CASs and other atomic RMW ops in general, in all common architectures are purely local operations that do not affect the coherency fabric (not any more than a store would, memory barriers do not even do that).
The only thing that can be said to be 'flushed' by a CAS (at least on a x86 where it has full barrier semantic) is the store buffer (which is not a cache), but even there the only thing it does is simply prevent more recent loads [1] to complete before any store older or equal than the cas has completed. The store buffer is not really flushed and is still drained at the same speed it would without at cas.
[1] or any other operation for simplicity of implementation.
Not in the sense of flushing from cache to memory, but yes in the sense of data from a local core cache needing to be sent to another core.
Caches may be coherent but coherency is definitely not free, especially on multi-processor systems, which you probably care about if you are writing this kind of code.