You need an atomic load, followed by an ordinary comparison. On x86 and x64, this is cheap (properly-aligned loads and stores are already atomic, and the architectures do very little reordering that has to be prevented). On ARM it's more expensive.
You're probably thinking of things like atomic increments, which are one to two orders of magnitude more expensive than an ordinary increment.
You're probably thinking of things like atomic increments, which are one to two orders of magnitude more expensive than an ordinary increment.