The C standards have the perfectly fine name "implementation dependent" to descr...

tom_mellior · on May 20, 2021

> The C standards have the perfectly fine name "implementation dependent" to describe those things.

That term is not used by the C standards. Do you mean "implementation-defined"? asm is not among the explicitly specified implementation-defined behaviors, it's listed under "Common extensions". I don't see any mention at all of syscalls in C99. (I'm working with http://www.dragonwins.com/courses/ECE1021/STATIC/REFERENCES/... here.)

formerly_proven · on May 20, 2021

I'm not sure why syscalls would be UB; it's just not something defined by the C standard.

Edit: To clarify, I meant UB in the sense it is typically used in these discussions, where the standard more-or-less explicitly says "If you do X, the behavior is undefined." Not in the literal sense of "ISO C does not say anything about write(2), hence using write(2) is undefined behavior according to the C standard", which seems like a rather tautological and useless statement to me.

hvdijk · on May 20, 2021

What do you think UB is if not something where the behaviour is not defined?

hvdijk · on May 20, 2021

About your edit:

> "ISO C does not say anything about write(2), hence using write(2) is undefined behavior according to the C standard", which seems like a rather tautological and useless statement to me.

That is actually not so useless at all: if you try to compile and link a program that declares and calls a function but does not define it, you will typically get a linker error about an unresolved reference. If the name matches a non-ISO C library function, however, the implementation cannot know whether your program is in error or whether you want to use that library function, and will usually accept it. For this reason, the C standard does actually make it clear that using write(2) is UB to make it clear that implementations are not required to diagnose that as an error.

tsimionescu · on May 20, 2021

UB, in this context, is very explicitly used in the standard: it is undefined behavior related to a construct that the standard describes.

hvdijk · on May 20, 2021

> behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

That is literally the definition of UB from the C standard. It is explicitly also about constructs that the standard does not describe. That makes sense: the standard does not and cannot define the behaviour for any construct not in the standard, so cannot impose any requirements for such constructs, and that is all UB is: something where the standard imposes no requirements.

tsimionescu · on May 21, 2021

The relevant discussion about UB is restricted to constructa that the standard describes. For example, writing past the end of object is UB - the construct is described in the standard, but is given no semantics by the standard.

The standard does not describe pattern matching, so using pattern matching is also undefined behavior, but there is nothing to be talked about here.

hvdijk · on May 21, 2021

The comment I replied to did talk about something not described by the standard though, namely syscalls. If you want to argue that we should not be talking about syscalls here, your issue should be with the original comment that brought them up (https://news.ycombinator.com/item?id=27222325), not with my reply, I think. However, that comment looks perfectly fine to me. Also, depending on how the syscalls are made, it actually may be explicitly described as UB by the standard, see my comment https://news.ycombinator.com/item?id=27228701 too.

tsimionescu · on May 21, 2021

Syscalls are not any more UB than any other function call, though. Whether talking about write(2) or my_foo(), the call has the semantics given by the function signature visible in the current translation unit. Sure, the C standard doesn't define what write(2)'s effects will be, but that does not mean that calling it is UB according to the standard.

If the function has not been declared by the time it is first used, even then calling it is not UB - it is defined to be a compilation error (in versions earlier than C99 it was actually valid, but UB if the call did not match the actual function definition).

hvdijk · on May 21, 2021

> Sure, the C standard doesn't define what write(2)'s effects will be, but that does not mean that calling it is UB according to the standard.

Yes, it does. I already explained exactly why it needs to be UB, but let me quote where the standard says so:

C99 6.9 External definitions:

> Semantics:

> An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.

If your program provides a declaration of write() and uses it without also providing a definition, the program does not have "exactly one external definition for the identifier", it has zero definitions for the identifier. This violates a "shall" that appears outside of a constraint, for which we turn to:

C99 4 Conformance:

> If a "shall" or "shall not" requirement that appears outside of a constraint is violated, the behavior is undefined.

aw1621107 · on May 21, 2021

> let me quote where the standard says so:

Wouldn't this hinge on what precisely "entire program" means? A definition for write(2) may not appear in the source code you wrote, but if "entire program" includes e.g., libraries dynamically linked in then it's quite feasible for the end result to be fully defined.

For example, 5.2.2 Paragraph 2 starts with (emphasis added):

> In the set of translation units and libraries that constitutes an entire program

hvdijk · on May 21, 2021

Sure, but in the situation we were talking about, the user never wrote a definition for write(), and the user did not specify any library to include that provided a definition of write(). From the standard's perspective, that means there is no definition for it in the entire program.

Keep in mind that the standard's perspective is somewhat different from how things work in practice. We know that on Unix-like systems, there is also the concept of libraries, somewhat different from how the standard describes it, and write() will be provided by the "c" library. But consider the following strictly conforming program:

  #include <stdio.h>
  void write(void) {
    puts("Hello, world!");
  }
  int main(void) {
    write();
  }

A confirming C implementation is not allowed to reject this for a duplicate definition of write(): the name "write" is reserved for use by the programmer, it is not reserved to the implementation. This program must be considered not to violate the "there shall be exactly one external definition for the identifier", so the only way to consider this valid is to say that the implementation does not implicitly provide an external definition of the write() function as far as the C standard is concerned.

Yet at the same time, from the perspective of the implementation, the c library is considered to provide a definition of the write() function, but it is a definition that is only used if the program does not override it with another definition that should be used instead. This concept of multiple definitions for the same name, with rules specifying which of the multiple definitions gets picked, is very useful but is also beyond the scope of the C standard. When we say that a function is defined, we need to be clear on whether we use "define" in the ISO C sense or in some other sense. As your comment shows, things get very confusing if we are not careful with that.

aw1621107 · on May 22, 2021

> and the user did not specify any library to include that provided a definition of write()

Ah. I had assumed that that was implicit in "using write(2)", but seems that was a bad assumption.

> there is also the concept of libraries, somewhat different from how the standard describes it

In what way?

You make an interesting point with the example. It's not something I had considered before. Would weak linkage (or a similar mechanism that allows for a provide-unless-the-user-already-did-so type of behavior) fall under an implementation extension, then?

hvdijk · on May 22, 2021

> In what way?

For the most part the standard does not address the existence of libraries other than the standard library, but 5.1.1.1 contains "Previously translated translation units may be preserved individually or in libraries." This, to me, suggests that from the standard's perspective, when you link in a library, you simply get that library, whereas on Unix systems, when you link in a static library, you specifically get those object files from the library needed to resolve not yet defined references, and when you link in a shared library, you get something where it becomes possible to have duplicate definitions where rules come into play as to which definition will end up used.

> You make an interesting point with the example. It's not something I had considered before. Would weak linkage (or a similar mechanism that allows for a provide-unless-the-user-already-did-so type of behavior) fall under an implementation extension, then?

Yes, I think so. Shared libraries implicitly have some sort of weak linkage already aside from the explicit weak linkage that you can get with e.g. GCC's __attribute__((weak)), but both forms count as extensions, I would say.

Google234 · on May 20, 2021

So is everything UB since all hardware isn’t perfect?

faho · on May 20, 2021

So if it is behavior that is not defined by the C standard, would that not make it undefined behavior?

steveklabnik · on May 20, 2021

There is a difference between jargon in context and the use of those words in a general sense. It can be "undefined behavior" in a general sense, but not necessarily "undefined behavior" in the jargon sense.

After all, if I were to use the words "undefined behavior" in a sentence unrelated to the standards, the definition in the standard of "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements." would be nonsense. Same goes in the other direction.

colejohnson66 · on May 20, 2021

While technically correct, “undefined behavior” in terms of C and C++ refer to what the standard calls out explicitly as undefined, and not a simple “it’s not referenced, therefore it’s undefined.”

For example, signed(?) integer overflow is explicitly undefined by the standard, but as @formally_proven said, just because write(2) isn’t mentioned doesn’t mean usage of it is undefined.

_kst_ · on May 20, 2021

Actually, that's exactly what it means:

> If a "shall" or "shall not" requirement that appears outside of a constraint or runtime-constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words "undefined behavior" or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe "behavior that is undefined".

write() is a function, and a call to it behaves like a function call, but the C standard says nothing about what that function does. You could have a function named "write" that writes 0xdeadbeef over the caller's stack frame. Of course if "write" is the function defined by POSIX, then POSIX defines how it behaves.

aw1621107 · on May 21, 2021

> but the C standard says nothing about what that function does

I'm pretty sure I'm just bad at searching through the standards document, but does the Standard actually define the precise semantics of function calls? 6.2.2 is about the function calls and the result thereof, but doesn't seem to be quite as precise about the semantics as I might expect.

cperciva · on May 20, 2021

No. "Implementation defined" says "the standard doesn't specify what happens here but the compiler must document what it does". That's a step removed from "the compiler may define what this does".