Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would interpret that to mean that if memory is uninitialized to a specific value and you're working in a physical rather than virtual address space, then a read from some random pointer to uninitialized data might return an unpredictable value, such as the value of a GPIO register or a word from an RS-232 buffer. There's clearly a defined behavior for that specific platform.

And when they're talking about doing something hardware-specific I would take that to be about the MMU on certain platforms. Platforms with an MMU and where the processor is executing code in a virtual address space can generate an access violation if the program code reads data at a virtual address that is not mapped to a specific segment of physical memory.

Even when a program runs in a physical address space, the processor may be configured with an address space that is significantly larger than the actual block of memory and registers on the memory bus. And this matters because if you have a 32-bit processor but the upper 3 gigabytes of the address space don't map to a register file or RAM then it's hardware-dependent what happens when you read from those addresses.

Some hardware will roll the map over and read from somewhere in the available section of the address space based on the offset, and some hardware will treat that as an error which triggers an access violation. Some hardware will bug out and do something completely unexpected. It's undefined behavior what happens in this case. It has everything to do with how your specific chip, mainframe, or minicomputer is wired. And that's what they're referring to.

Undefined behavior does not mean "do whatever you want," it means "do what the hardware would probably do in this situation, or do whatever can be reasonably expected." And that's important here because that means that UB is not carte blanche to violate the programmer's expectations here.



As far as the standard's goes, you're interpreting undefined behavior to mean unspecified behavior -- behavior that's well-defined, but whose definition "depends on the implementation" (in this case, your hardware). I mean, I can't stop you if you want to read it that way, but you're literally interpreting it to mean exactly what they did not intend it to mean... that's why they chose separate terms and explicitly told you undefined behavior is allowed to be unpredictable, rather than being required to be implementation-dependent.

But let's ignore the standard... who cares what it says...

The thing is, what you're asking for inhibits optimizations that many people would very much like to see from their compilers. Like for example if you have an uninitialized function pointer that you only assign to in 1 or 2 locations -- the compiler should be able to just replace the indirect function calls with direct function calls. You're demanding that it doesn't do that, and that it simply call whatever function or non-function that pointer happened to point to. I mean -- you're welcome to ask for that, and maybe your compiler should have a flag to make it behave that way (or maybe it does already? do you use it if so?), but to me and many other people, the compiler should obviously be permitted to see right through that.


> You're demanding that it doesn't do that

Rather than telling someone what they are demanding, it's often better to ask them --- especially if you are sure that what they are demanding is stupid. Personally, I'd guess that 'jschwartzi' might be fine with a compiler that makes the optimization you refer to, and is instead objecting to a compiler that deletes essential safety checks in other parts of the program on the assumption that all bets are off once "undefined behavior" can be proven to occur. If he's like me, he'd probably also prefer that the compiler issue a warning about the undefined behavior rather than silently making changes to the program. But better to ask him than to guess.


I don't think it's a stupid demand at all -- like I said, I have nothing against implementations behaving more nicely if they wish. I'm just saying that this is an extra demand, not an interpretation of the standard, and that it would have performance repercussions which many C users would rather avoid.

In the case of your safety check example, it'd be nice if you could mention something concrete so we know exactly what situation you're talking about. But I mean, I can't rule out that maybe you'll find a couple situations here and there where the standard shouldn't leave things undefined. But the argument I'm rebutting here is that all instances of UB must behave "like the hardware", not that this particular instance is good but another one is bad, so I'm not sure you two would agree. I agree warnings would be nice too (some of which already exist), and I think despite their current efforts compiler still have some ways to go (e.g. a macro expanding to 0 should probably not behave the same as the literal 0 when you multiply, say, by a constant), but again, that's already assuming you're fine with UB...


> In the case of your safety check example, it'd be nice if you could mention something concrete so we know exactly what situation you're talking about.

There are some well-publicised cases of compilers removing NULL-checks[0] on the assumption that the NULL value can't occur as it would be UB.

As another example, the Linux kernel assumes in several places that signed integer overflow wraps, so it is compiling with -f-no-strict-overflow/-fwrapv ever since GCC started optimizing based on this piece of UB (they noticed the compiler behavior change and added the flags before releasing any faulty kernels though, apparently).

[0] https://lwn.net/Articles/342330/


I know about that NULL check example, but the thing is, the compiler is being pretty reasonable there. The fact that the pointer is dereferenced means that the NULL check was useless: if the address is NULL, then the NULL check can never be reached because you'd have already segfaulted (your beloved hardware behavior!) on the dereference earlier. [1] So that code is dead, and needs to be removed. The fact that the dereferenced variable is unused means that the compiler needs to eliminate that too.

People want these optimizations individually. They don't want to keep dead code taking up cycles and they also don't want dead variables taking up registers. So you can't really find much support arguing that those optimizations should be removed entirely. The only real possibilities you can propose here are that the compiler should have magically re-inserted the pruned check during the second optimization, or that it should have performed them in the opposite order. But are you sure these are actually possible and if so, practical? I mean, maybe they are, but they are far from obvious to me. I can easily see the compiler thrashing and failing to reach a fixed point if it re-inserts code that a previous optimization pass pruned. Similarly, I don't see how the compiler can just magically detect an optimization order that ensures "surprising" situations like this don't occur. My guesstimation is that it would carry severe downsides people wouldn't want. Now maybe I'm just not smart enough to see a good solution to this that doesn't carry significant downsides, and there's already one out there. If there is, I'm curious to hear about it, and I hope someone implements it under some flag, but I have yet to hear of one.

For signed integer overflow -- that might be one place where I think it would make sense to just define it to either wrap with 2's complement just like unsigned integers do, or to be unspecified behavior that falls back to the implementation's representation. Though in the latter case... you already have an implementation-specific solution: your compiler flags. But again, we might agree on a couple optimizations here and there, but that's a far cry from saying UB should just fall back to hardware behavior. And honestly, I'm not even here supporting C; I hate it. If you want wrapping, I would suggest it's a sign you might want to use C++ already. Then you can define an integer type that will play Beethoven when you overflow, and people who want their UB on overflow can have that too.

[1] I'm ignoring the validity of address 0 in kernel-mode here; there's also some subtleties on what's a null pointer and what's address zero that are rather beside my point.


I think that everyone agrees that NULL checks should be elided when the compiler can prove that they are useless. However, I think most people assume that the way for the compiler to prove that would be "when it sees that the variable has a non-NULL value" - e.g. `int v = 0; int *p = &v;`.

That's the sort of thing I would suggest - don't work back from UB (and I agree, I wouldn't expect the optimizer to backtrack optimizations as new facts come up), work forward from actually known facts.

NULL checks are probably a pretty bad example, since the NULL access would surely SEGFAULT if allowed to execute (though the particular case of address 0 vs NULL, and of code catching segfaults like on Windows, throw a wrench in this assumption eve here), but other types of UB are much worse.

If you accidentally issue a read from a point after the end of an array, but only later check that the index was within bounds (e.g. `int x = a[i]; if i < len(a) return x; else return NULL`), the compiler eliding the bounds check by the same logic will take a program that might have been safe in practice to a program that is certainly not safe. Note that I don't know if compilers perform this type of optimization, so this may be a hypothetical.

In general though, I think that the tension here comes from C being used in 2 very different use cases: 1 is C used as portable assembly, where you expect the compiler to keep a pretty 1:1 mapping with your code; and the second one is C used as the ultimate performance language, where you drop to C when you can't optimize further in anything else. I think most of the complaints about "exploiting UB" come from the first camp, whereas the second camp is pretty happy with the current status quo.


I think you're focusing so much on particular examples you're missing the larger point. To repeat what I've said repeatedly I'm not defending every single instance of UB in the standard. And I can't keep going back and forth with you to to debate every single one (yes, int overflow = no UB, NULL deref = yes UB, out-of-bounds = maybe UB, etc.), which is what we're ending up doing rather pointlessly right now. The point to take away here is that UB itself as a notion is something people in both camps desire in many scenarios, so you can't just get rid of it in its entirety and say "map it to hardware" or "always reason forward". Because, again, if you do, in many cases, those would inhibit optimizations that people want. The only real solution is for people to stop seeing C as a portable assembly language, which it is simply not. It's defined in terms of an abstract machine, so either people need to switch languages, or switch their mental models.


I don't think emitting a warning is feasible in most cases. The compiler (generally) doesn't know at compile time that UB definitely occurs, only that in some UB exists on some path that may or may not be reachable in theory or practice.

Usually these are not reachable in practice so warnings that these exist would cause a tidal wave of pointless warnings. Instead, the compiler simply prunes those paths which can lead to better code generation for the paths that are taken.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: