e.g. removing a check for for overflow is definitely NOT ignoring the behavior. Deleting write because it would be undefined behavior for a pointer to point at some location is also NOT ignoring the behavior. Ignoring the behavior is exactly what the rationale is describing when it says UB allows compilers to not detect certain kinds of errors.
Returning a pointer is certainly a use. In any event, the prevailing interpretation makes it impossible to write a defined memory allocator in C.
If a program writes through a dangling pointer and clobbers a return address, the programmer made an error and unpredictable results follow. C is inherently memory unsafe. No UB based labrynth of optimizations can change that. It is not designed to be memory safe: it has other design goals.
> e.g. removing a check for for overflow is definitely NOT ignoring the behavior. Deleting write because it would be undefined behavior for a pointer to point at some location is also NOT ignoring the behavior.
Depending on how you look at it, this is ignoring the behavior.
For example, say you have this:
int f(int a) {
if (a + 1 < a) {
// Handle error
}
// Do work
}
You have 2 situations:
1. a + 1 overflows
2. a + 1 does not overflow
Situation 1 contains undefined behavior. If the compiler decides to "ignor[e] the situation completely", then Situation 1 can be dropped from consideration, leaving Situation 2. Since this is the only situation left, the compiler can then deduce that the condition is always false, and a later dead code elimination pass would result in the removal of the error handling code.
So the compiler is ignoring the behavior, but makes the decision to do so by not ignoring the behavior. It's slightly convoluted, but not unreasonable.
More than slightly convoluted. The obvious intention is that the compiler ignores overflow and lets the processor architecture make the decision. Assuming that overflow doesn't happen is assuming something false. There's no excuse for that and it doesn't "optimize" anything.
> The obvious intention is that the compiler ignores overflow and lets the processor architecture make the decision.
If that were the case, wouldn't signed overflow be implementation-defined or unspecified behavior, instead of undefined behavior?
> Assuming that overflow doesn't happen is assuming something false.
It's "false" in the same way that assuming two restrict pointers don't alias is "false". It may not be universally true for every single program and/or execution, but the compiler is explicitly allowed to disregard cases where the assumption may not hold (i.e., the compiler is allowed to "ignor[e] the situation completely").
And again, the compiler is allowed to make this assumption because undefined behavior has no defined semantics. If the compiler assumes that no undefined behavior occurs, and undefined behavior does occur, whatever happens at that point is still conforming, since the Standard says that it imposes no requirements on said program.
> it doesn't "optimize" anything.
...But it does allow for optimizations? For example, assuming signed overflow can allow the compiler to unroll/vectorize loops when the loop index is not the size of a machine word [0]. Godbolt example at [1].
> If that were the case, wouldn't signed overflow be implementation-defined or unspecified behavior, instead of undefined behavior?
No, because (among other reasons) the processor architecture might decide to trap or not trap depending the run-time values of configuration registers that the compiler doesn't know and can't control or document.
> the processor architecture might decide to trap or not trap depending the run-time values of configuration registers that the compiler doesn't know and can't control
I'm not certain that that would fall outside implementation-defined behavior. Would something like "Program behavior on overflow is determined by processor model and configuration" not work?
> or document.
And even if the behavior couldn't be documented, that could be covered by unspecified behavior (assuming the language in the C standard is the same as in the C++ standard in this case)
> Would something like "Program behavior on overflow is determined by processor model and configuration" not work?
Not sure; if nothing else, that seems like it would allow the implementation to avoid documenting any implementation-defined behaviour with a blanket "all implementation-defined behaviour is whatever the hardware happens to do when executing the relevant code".
I mean, that works? It's not great by any means, but it at least eliminates the ability to make the assumptions underlying more aggressive optimizations, which seems like it'd address one of the bigger concerns around said optimizations.
Perhaps I should have phrased it as "all implementation-defined behaviour is whatever the hardware happens to do when executing whatever code the compiler happens to generate".
The point of implementation-defined behaviour is that the implementation should be required to actually define the behaviour. Whereas undefined behaviour doesn't impose any requirements; the implementation can do whatever seems reasonable on a given hardware architechure. That doesn't mean that backdoor-injection malware pretending to be a implementation is a conforming implementation.
> Perhaps I should have phrased it as "all implementation-defined behaviour is whatever the hardware happens to do when executing whatever code the compiler happens to generate".
Even with this definition, the important part is that compilers would no longer be able to ignore control flow paths that invoke undefined behavior. Signed integer overflow/null pointer dereference/etc. may be documented to produce arbitrary results, and that documentation may be so vague as to be useless, but those overflow/null pointer checks are staying put.
Err, that's not a definition, that's a example of pathologically useless 'documentation' that a perverse implementation might provide if it were allowed to 'define' implementation-defined behaviour by deferring to the hardware. Deferring to the hardware is what undefined behaviour is, the point of implementation-defined behaviour is to be less vague than that.
> may be documented to produce arbitrary results, and that documentation may be so vague as to be useless, but those overflow/null pointer checks are staying put. [emphasis added]
Yes, exactly; that is what undefined behaviour is. That is what "the standard imposes no requirements" means.
> Deferring to the hardware is what undefined behaviour is
If that were the case, the Standard would say so. The entire reason people argue over this in the first place is because the Standard's definition of undefined behavior allows for multiple interpretations.
In any case, you're still missing the point. It doesn't matter how good or bad the documentation of implementation-defined behavior may or may not be; the important part is that compilers cannot optimize under the assumption that control flow paths containing implementation-defined behavior are never reached. Null-pointer checks, overflow checks, etc. would remain in place.
> Yes, exactly; that is what undefined behaviour is. That is what "the standard imposes no requirements" means.
I think you're mixing standardese-undefined-behavior with colloquial-undefined-behavior here. For example, if reading an uninitialized variable were implementation-defined behavior, and an implementation said the result of reading an uninitialized variable was "whatever the hardware returns", you're going to get some arbitrary value/number, but your program is still going to be well-defined in the eyes of the Standard.
When I said implementation-defined, I meant implementation-defined. This is because the applicability of UB-based optimization to implementation-defined behavior - namely, the lack thereof - is wholly uncontroversial. Thus, the diversion into the quality of documentation-defined behavior is not directly relevant here; the mere act of changing something from undefined behavior to implementation-defined behavior neatly renders irrelevant any argument about whether any particular UB-based optimization is valid.
> Compilers cannot assume that, because (in the general case) it is not true.
This is not necessarily true. For example, consider the semantics of the restrict keyword. The guarantees promised by a restrict-qualified pointer aren't true in the general case, but preventing optimizations because of that rather defeats the entire purpose of restricting a pointer in the first place.
More generally, the entire discussion about UB-based optimizations exists precisely because the Standard permits a reading such that compilers can make optimizations that don't hold true in the general case, precisely because the Standard imposes no requirements on programs that violate those assumptions.
> I think the author of that blog was correct: the preferred path is for the compiler to provide data to the programmer to simplify the loop.
Requiring the equivalent of PGO is a rather unfortunate bar, though to be fair if you're that interested in performance it's probably something worth looking into anyways.
I'm curious how how noisy an always-on warning for undersized loop variables would be, or how much code would have broken if int were changed to 64 bits on 64-bit platforms...
> For your godbolt example, use the C compiler not c++
Sorry; that was a mistake on my end. The same phenomenon occurs when compiling in C mode, in any case [0].
Returning a pointer is certainly a use. In any event, the prevailing interpretation makes it impossible to write a defined memory allocator in C.
If a program writes through a dangling pointer and clobbers a return address, the programmer made an error and unpredictable results follow. C is inherently memory unsafe. No UB based labrynth of optimizations can change that. It is not designed to be memory safe: it has other design goals.