Data races are undefined behavior in the C/C++ legalese sense of compilers are not required to preserve the semantics of the abstract machine when they occur.
Where this behavior is useful would be situations like traps: if you've got a potentially-trapping instruction (say, division) whose result isn't used, you can delete that instruction even if you can't prove that it never traps. The problem that arose more recently is that compilers began to feed these assumptions into their value analysis (oh look, you said int a = b / c, therefore c ≠ 0), and turning these into code-chomping bits.
Data races are the kind of undefined behavior that are too hard to constrain by specification. While the hardware semantics of data races are simple enough to specify (basically memory_order_relaxed with word tearing), mapping source-level reads/writes to hardware loads and stores isn't easy. Java tried and still got it wrong.
You don't have to worry about the "the compiler proved a data race happened and turned my program garbage" happening, at least any time in the next decade or so. The compiler would have to be able to identify which code fragments must be running concurrently in different threads without appropriate happens-before links, which is beyond the capabilities of anything someone will put even in a -O3 compiler flag.
They are defined behavior in Java, but not in C++. Things still get extremely complicated[1] if you use them in Java, so I would treat them as undefined behavior regardless and ban them from any Java code base.
Note that Java has some implicit synchronization built-in, this is to preserve memory safety in incorrectly synchronized code. That comes at the cost of some overhead which will only increase as systems get built with more cores.
[1] Read the three sections starting from https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.htm... for the details. Suffice to say you have to consider all possible interleavings of memory accesses, not unlike trying to reason about a non-deterministic Turing machine. In particular, the causality requirements leave some room for unexpected executions, as they do not apply equally to all inter-thread actions. This can REALLY bite you in the ass if you only test on particular VM implementations or architectures.
People who, like me, were taught concurrency at a time when Java had a proper memory model but C and C++ did not (i.e. prior to 2011) should go read the docs for C11 memory orders (used also by Rust and C++).
Writing lockless Rust/C/C++ leads to a bad time (especially on non-x86 architectures) if one believes what's true about concurrency in Java to be generally true on computers.
There is a guy in the C++ community known for writing lock-less data structures (Tony Van Eerd), if I recall correctly one of his advices is to think multiple times before ever considering writing such kind of code.
The low-level nature of C++ (and also C) makes data races unavoidably undefined behavior. Imagine a data race that causes a torn write on a pointer. This pointer could now point pretty much anywhere. The function stack, some vtable, program data, etc. Any write to that dereferenced pointer will alter the state of your program in such a way that the behavior of your program will become undefined. There would be no way for C++ to guarantee any kind of defined behavior when you can write to arbitrary locations in memory.
There are two different things that people can mean when they say "undefined behaviour" in regards to C/C++. One is what you're talking about, with "undefined behaviour" meaning "one of several things could happen, you don't know which, and some of them may be bad."
The other is more specifically talking about code that is deemed "undefined behaviour" by the spec. There's a surprising amount of this, and the compiler is explicitly allowed to do anything it wants when encountered, from emitting 'close enough' code to emitting no code at all, to emitting code to order a pizza to your current location.
If this is the latter case then it could result in random bits of code mysteriously doing completely unexpected things if the compiler thinks a data race there is possible. This would make it far harder to track down bugs and reason about what code might cause such.
Yes, if your data races can pollute pointers, that makes subsequent access of those pointers undefined behavior.
But is the actual data race undefined behavior?
Take the following example from the article:
Thread A loads the value 0
Thread B loads the value 0
Thread A increments the value, calculating a result of 1
Thread B also calculates a result of 1
Thread A stores the value 1
Thread B stores the value 1
This is definitely indeterminate behavior, for the final value could also be 2. But if it is undefined behavior, the standard would also allow that final value to be 3 or 999 or -1 or INTMAX, it would allow anything.
To see a case where this matters, suppose you want an approximate total and you do this by concurrent addition to some shared variable. You expect overlapping additions to be so infrequent as to not have a large effect on the final value.
Thus, you decide to not make that value atomic and accept the inaccuracy since it is an approximation anyway.
Maybe you do this because atomic locks an entire cache-line, and something else in the cache-line is frequently accessed due to struct outline.
If the above scenario is actually 'undefined behavior' then this would be a really bad idea. You'd be at the mercy of the compiler not wanting to use this for any optimization.
It's only undefined behavior if you use regular variables and regular increments. If you use the standard atomic functions/classes with std::memory_order:relaxed, it's legal, and provides what you want.
std::memory_order:relaxed would not cover my use case.
This still requires locking, if we are using atomic<int> on x86 the locking is probably done at an assembly level which is much faster, but it remains locking.
Specifically, that form of locking marks a specific cache line 'locked' and blocks all processors from modifying that cache line.
When you're variable happens to share that cache line with something you want quick access to, this stuf matters.
(there is an exception using 'compare and swap' instructions, but I've only seen ICC emit those, not GCC.)
In general, I would expect data races to only create UB when the behavior they result in happens to be UB.
In my specific example, there is no source of UB except for apparently the data race.
If you really want that, just do a relaxed read into a local variable, increment there and relaxed store afterwards. At least clang will give you exactly the assembly you want, minus the technically undefined behavior:
https://godbolt.org/g/9ikt1m
memory_order::relaxed doesn't imply locking. Compilers on platforms where aligned access is atomic will generally implement relaxed reads and writes without locking, barriers or other expensive operations. So x = x + 1 with relaxed semantics should be cheap and best effort, like you want.
Yes, data rases are undefined behaviour and like with any other undefined behaviour; as far as the standard is concerned, the entire program is without meaning and the compiler is free to do whatever it wants.
Is that in the sense that the compiler is allowed to assume they don't happen? I always figured it was defined behavior but not determinate.