C Questions and Answers

ChuckMcM · on March 4, 2015

I think this was silly, the author clearly does know C but they are complaining about optimizing compilers which do things "behind your back" and are becoming an increasing nuisance. It's sort of a passive aggressive "I think this should be an error but it isn't an error because twisted logic that the compiler uses with respect to undefined operation."

That people can teach themselves what to expect the compiler to do isn't all that surprising, and it also isn't surprising that a "modern" compiler does stuff an "old" C programmer might think is ridiculous.

I've engaged in this particular argument a few times only to throw up my hands in frustration over some exquisitely twisted line of reasoning that gave the compiler hacker person a fraction of a percent improvement[1] by exploiting this kind of situation. As long as I have a compiler flag that turns it all off its tolerable. But sheesh, sometimes I think these are C programmers who don't have the guts to become Rust programmers. You want to start fresh dudes, clean slate. Embrace it.

[1] "But Chuck, over the millions of machines out there its like an entire computer's worth of CPU cycles you can use for something else!"

sjolsen · on March 4, 2015

>I think this was silly, the author clearly does know C but they are complaining about optimizing compilers which do things "behind your back" and are becoming an increasing nuisance

I think the compiler behaviour is meant to be illustrative of the main point, not the main point itself, which I think is simply that the actual C language is more complicated than people appreciate. Most of the examples have nothing to do with optimization whatsoever; they're purely focused on correctness.

tinco · on March 4, 2015

The article has nothing to do with optimizing compilers, nor requires knowledge of them. It merely uses optimizing compilers to illustrate what is and what isn't legal C. When there's undefined behaviour, the compiler has almost free reign, so it's important for a C developer to know what is and what isn't undefined behaviour.

My boss who is a C developer had only two answers wrong, he didn't remember you could tentatively declare a global var, and he was fooled by the comma operator inside the array index.

userbinator · on March 4, 2015

On the topic of (overly) aggressively optimising compilers:

http://blog.metaobject.com/2014/04/cc-osmartass.html

...and attempts at turning C into something a bit less programmer-hostile:

https://news.ycombinator.com/item?id=8233484

My point of view is that compilers should be optimising at the level of machine instructions, not by attempting to second-guess the programmer and remove code that it thinks invokes UB. I've looked at tons of compiler output over the years, and there's plenty of opportunity for optimisation in instruction selection and register allocation... C should be a "do what I say, not what I mean" type of language.

pjc50 · on March 4, 2015

I have a set of incomplete notes headed "typesafe object orientated macro assembler?" which is my thinking on this question. I think that captures what people want: optimal direction of the machine, with all modern conveniences and automated error prevention.

xenadu02 · on March 4, 2015

Meh, the problem is the preprocessor. A lot of really silly things are produced by macro expansion, so the "obviously silly" optimizations really do end up mattering.

pmr_ · on March 4, 2015

That sounds interesting, but I'm suspicious. Do you have any examples of such macros and maybe some statistics or ideas how often they are actually seen in the wild?

eatonphil · on March 4, 2015

There are also libraries such as libCello[0] and Viola[1] (mine) to really simplify C through macros. libCello is some really cool stuff.

[0] http://libcello.org/ [1] https://github.com/eatonphil/viola

cbsmith · on March 4, 2015

> they are complaining about optimizing compilers which do things "behind your back" and are becoming an increasing nuisance.

Actually no. I think there was maybe one case where you could argue it was the optimizer that is producing different results than you'd expect for unoptimized output, but in most cases you have a problem that exists because of poor assumptions on the part of programmers/error prone language definitions.

rm445 · on March 4, 2015

That's only the one example, and the author could have thrown his hands up with the usual explanation that undefined behaviour entitles the compiler to launch missiles at you or whatever.

But in fact it's quite a reasonable (the only reasonable?) approach to optimisation, given a function that might invoke undefined behaviour on certain arguments, to emit code that is optimised for work on arguments that don't.

That doesn't seem to me to be tortured logic. The compiler ought to make that optimisation, always. It might be perfectly clear to the programmer that the function in question can never be sent a null pointer, but only by reasoning about the program on a level the compiler can't. It's only a minor side benefit that this can allow a certain amount of reasoning about the code paths that might be taken when you DO invoke undefined behaviour. That usually won't be much use, but might help one identify the kind of error one has made.

I expect most of us here have puzzled over some confusing output from a C program and tried to work out, from the output, whether we made an allocation error or overflowed a buffer or were off-by-one on some bounds. It's a wonderful language in some ways, but the pitfalls are there. Which is the point the author is trying to make.

GregBuchholz · on March 4, 2015

>But sheesh, sometimes I think these are C programmers who don't have the guts to become Rust programmers.

I suppose another perspective is that there are no shortage of programmers who are suffering from the Stockholm Syndrome, and having them make excuses for existing language's shortcomings are one reason it is harder to get critical mass behind less borked languages ;-)

In a similar vein to the original article, some may like to play along with: http://www.gowrikumar.com/c/index.php

...anyway, I don't see the problem exposing people to potential pitfalls. It's almost like people arguing that people shouldn't read "Expert C Programming: Deep C Secrets":

https://www.google.com/search?q=expert+c+programming+deep+c+...

...also I'll just throw this out here:

http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

EpicEng · on March 4, 2015

Well apparently I do know C. If the author wanted to be as contrived a possible there are certainly more devious edge cases which could have been trotted out. The fact that e.g. the compiler may optimize out a NULL check after you've already dereferenced the darn thing shouldn't be surprising. Just fix your silly bug.

rav · on March 4, 2015

The problem is that (correct me if I'm wrong) an expression like "&foo->bar" counts as a dereference of foo, even though the result of the expression is simple pointer arithmetic involving foo (adding the offset of the bar member).

pcwalton · on March 4, 2015

Yes, you're correct [1], and that's why offsetof is a builtin in gcc and clang.

Even more insidious is the fact that memcpy of zero bytes from or to null is undefined behavior.

[1]: http://stackoverflow.com/questions/26906621/does-struct-name...

haberman · on March 4, 2015

That's what offsetof() is for! http://man7.org/linux/man-pages/man3/offsetof.3.html

efaref · on March 4, 2015

I've worked on a system where NULL mapped to valid memory. (It was an embedded system, so the memory map was custom and bizarre). Of course, insanity ensued when new programmers worked on it.

astrobe_ · on March 4, 2015

Is it the kind of system for which memset(somestruct, 0, sizeof(somestruct)) doesn't work if the struct has pointers (NULL isn't 0, so null checks will fail if one inits such struct like this)?

EpicEng · on March 4, 2015

NULL does not have to be 0 in the sense that it is represented by all 0 bits, it need only compare equal to 0.

Aissen · on March 4, 2015

Not really, NULL is still 0, but the address "0" is valid and readable/writable.

JoeAltmaier · on March 4, 2015

Yeah but to fix that, you only have to not use that byte of memory. Or not have to point to it. Not a real issue.

tinco · on March 4, 2015

Clever pseudonym mr Sweeney ;)

EpicEng · on March 4, 2015

Ha, I wish! I just happen to work for another 'Epic'ly named company.

ternaryoperator · on March 4, 2015

This reminds of me of the Quiz books that were popular years ago. They'd show some code that inadvertently tripped some obscure corner of the language.

Rarely did the quizzes provide great insight. Rather, they confirmed the benefits of keeping your code idiomatic.

weland · on March 4, 2015

It's pretty much what pops into my head every time I see things like these. I got most of those correct, but my universal reaction was why the hell would I write something like that in the first place?

ben0x539 · on March 4, 2015

I figure a lot of the time, you don't write it like that, but you get weird behavior in a much more complicated situation without obvious defects that eventually can be reduced to an example that would fit in with those quizzes.

weland · on March 4, 2015

I don't remember any of that, either. The worst I've ever had was a silly bug due to operator precedence. Barring some truly uninspired things (like signed integer overflow being undefined), I really think most of those are cases one shouldn't run into, not even in a much more complicated situation.

JoeAltmaier · on March 4, 2015

Many of the complicated situations arise from macros and templates that use arguments in contexts the coder doesn't know. E.g. the stl. Also the macro writer doesn't know the context wherein the macro will be expanded. You can end up with issues of precedence, correct statement construction, expression evaluation order etc.

weland · on March 5, 2015

STL is not a problem in C land and fishy macros don't make it past code review in my book :-).

I don't disagree on the usefulness of teasing your brain with these things once in a while. However, I think the best way to ensure you don't hit bugs caused by such things is to avoid the situation altogether.

JoeAltmaier · on March 5, 2015

It doesn't have to be a very fishy macro at all to be problematic. How about

  #define  Sum(a,b) a+b

This of course fails in any context where the precedence of '+' doesn't match your intent e.g. Sum(1,5)7 becomes 1+57

Do you remember to always define your expression macros with parentheses? Any time you didn't do that, you have a bug.

lmm · on March 5, 2015

> Do you remember to always define your expression macros with parentheses?

Yes - that's a necessary part of being idiomatic. Any macro that's defined without brackets sticks out like a sore thumb, and won't pass code review, even if it's (initially) used in a place where they wouldn't be necessary.

weland · on March 5, 2015

> Do you remember to always define your expression macros with parentheses?

Oh, yes! No parentheses around elements of a macro definition is an obvious sound of trouble and looks very wrong on my retina.

I do agree, though, that if I somehow forgot to do that (tired? nervous?), that would be a bug that's difficult to spot. Point taken :). That's what code reviews are for, but there isn't always time or availability for one, sadly.

bnastic · on March 4, 2015

Quiz books, or more often - Brainbench

cbsmith · on March 4, 2015

Sadly, much of this code is idiomatic.

bgvopg · on March 4, 2015

Surprisingly accurate, author did his research.

5 is also missing a check for a NULL pointer. ;)

Most of there rules are unfortunately ignored, as obscure information. The worst offender I see in wild code is 5.

The second most ignored is not checking values before computation: 10, 11, 12.

No.7 is very interesting, rarely violated, most programmers don't even know that is a thing or just assume the processor won't trap on an unaligned read.

DanWaterworth · on March 4, 2015

> 5 is also missing a check for a NULL pointer. ;)

Not necessarily. It could be up to the caller to ensure that the pointer isn't NULL.

ericfontaine · on March 4, 2015

Although most comments use these examples to argue that C is a bad language, I would argue the opposite, that these examples show how C is an extremely useful language. C has occupied a niche position as the lowest commonly-used language that is both human-readable (at the level of expressing algorithms, data structures, functionality, and control flow) but not specific to any one instruction set architecture (and thus can compile to any architecture). Each C construct or statement maps efficiently into machine instructions, while not being an assembly language. I have trouble imagining how to make something like C any lower without becoming architecture-specific or cumbered with details that are more economically-suited for a compiler. But if too much higher, then the programmer becomes detached from this close relationship to the computer. The programmer can focus on implementing important speed & space performance details of an algorithm while not getting bogged down by more mundane details (such as register allocation, matching jump statements to their targets labels, keeping track of the return stack of a function, etc.) that are better suited for a compiler to handle. (That being said, I think Rust or something like it is a strong successor and can additionally express concurrency).

These examples illustrate well-intended and useful features of C, not flaws. I will explain why for each in a comment below:

pjmlp · on March 4, 2015

> I have trouble imagining how to make something like C any lower without becoming architecture-specific or cumbered with details that are more economically-suited for a compiler.

You mean Algol, Mesa, and few others from the same vintage or even older?

Or rather Macro Assemblers like MASM, TASM that provided higher level macros for structured programming?

ericfontaine · on March 4, 2015

what I mean in my paragraph is that C has all these (nasty) details that are necessary due to C's position as being almost assembly language (but not quite) AND being cross-platform. Macro assembles like MASM and TASM use ISA-specific assembly languages, so you can't write cross platform code. I suppose one could image a sortof cross-platform LLVM IR structured macro assembler might be an example of something lower than C that is still architecture-independent that you would then pass to an machine-specific optimizing compiler.

Algol is higher level than C.

pjmlp · on March 5, 2015

> Algol is higher level than C.

Sure, but it is also safer and already had helped implementing a few operating systems before C's authors could imagine coming up with C.

ericfontaine · on March 5, 2015

Right.

My understanding of history is compiled Algol wasn't nearly as fast as compiled C, which was needed for operating systems and performance-critical code.

pjmlp · on March 5, 2015

C fans like to re-invent history, just google for operating systems implemented in Algol variants and check their implementation dates.

Edit: Forgot to mention that up to the early 90's, C compilers generated pretty crappy code vs what any average Assembly coder could write. And was only relevant for those fortune to have UNIX at their company or university.

ericfontaine · on March 6, 2015

Ok.

What do you think was the reason C took off while Algol use diminished? Was the growth of Unix a significant reason? Do you think C's adoption was misguided?

I think C's took off because C fit the sweet spot of ability to produce fast code while still being cross-platform and human-readable.

pjmlp · on March 6, 2015

C took off because a few startups in the 80s used UNIX as the foundation of the workstations they were bringing into the market, like Sun for example.

As those workstations became a success in the US market, its use spread outside US and the need to have developers that could write software for them increased. This meant knowing C.

All the other operating systems at the time didn't offer C compilers. The few that did, it was just another language to choose from, most of the time only a subset of K&R C.

This is how the distinction between libc and POSIX APIs came to be. The original libc is mostly what could be implemented in other OSs without depending directly from the UNIX API semantics.

If the likes of Sun and SGI hadn't succeeded, most probably C would be a footnote just like Algol.

I have been writing software since 1986 and 1992 was the first time I cared to learn C, just to quickly ditch it for C++ on the year thereafter.

ericfontaine · on March 4, 2015

#1. Tentative definitions are historical baggage from Fortan Common blocks (https://blogs.oracle.com/ali/entry/what_are_tentative_symbol...), which may have helped adoption of C by Fortran users. Although it remains in the C language specification, this feature can be ignored.

#2. Treating dereferencing a NULL pointer as undefined behavior means that the compiler is not required to generate additional instructions such as asserts or crashes to guard against potentially dangerous side effects. C compiler assumes that the programmer is in control of his/her code. In this example, it can be assumed that a careful C programmer has already guaranteed that the pointer will not be null when it is dereferenced. This C feature is an optimization to avoid generating redundant or unnecessary asserts or handling code.

3. C allows the programmer to handing pointers, allowing for such low-level optimizations that may not be possible in higher-level languages. A careful C programmer may have taken steps outside the function to handle the situation where yp==zp, or may otherwise be unconcerned about a particular case, for performance reasons.

4. A correct implementation of IEEE 754.

5. Since C is designed to efficiently compile to any computer architecture, it needs to be aware of the distinction between the arithmetic width of an instruction set vs the width of addresses. Ints are optimized to default to the natural arithmetic width of an instruction set (so that compilation doesn't produce unnecessary packing/unpacking instructions whenever they are accessed) but is guaranteed to be atleast 16 bits wide. However, since data structure can be as large as addressable memory, it is necessary for size_t to be the width of addresses.

6. Allowing size_t to be unsigned allows all bits of a size_t variable to be utilized for expressing size.

7. Undefined behavior is, again, an optimization feature, allowing each compiler to implement as it sees fit.

8. Comma operator is useful when first operand has desirable side effects, such as compactly representing parallel assignment or side effects in for loops.

9. C allows unsigned integers to wrap around 0 and UINT_MAX. This feature can be utilized as an optimization, for example as a free (no additional instruction) deliberate modulus operation. This is usually how unsigned integers behave in assembly.

10 & 11 & 12. Some ISA's, like MIPS, treat overflow of signed numbers as an exception. Others simply treat the result as a valid two's-compliment value. Since C is machine independent, C's official specification for overlow of signed numbers must be compatible for all ISA's. Simply treating the result as undefined does the trick, and means the compiler doesn't have to make guarantees or version for each ISA.

_kst_ · on March 4, 2015

5. size_t doesn't necessarily have to be the width of an address. An address (say, a value of type void* or char) has to be able to refer to any byte of any object. A size_t only has to be able to represent the size of any single* object. The limit on the size of a single object and the limit on the total size of memory are often the same on modern systems, but C allows them to be different (think segments).

6. size_t is required to be unsigned.

9. C requires wraparound behavior for unsigned integers.

10, 11, 12. It's not just the result of an overflowing signed integer arithmetic operation that's undefined, it's the behavior. `INT_MAX + 1` can yield `INT_MIN`, or it can yield 42, or it can crash your program and reformat your hard drive (at least in principle).

ericfontaine · on March 5, 2015

I forgot about segments... Thanks for clarifications...I guess I don't know C. :)

TorKlingberg · on March 4, 2015

Good comment, but I'd like to add that int is not really the "natural arithmetic width of an instruction set" any more. We will never see 64-bit ints. The sizes of int and long seem to be "whatever works, and is compatible with what it used to be".

_kst_ · on March 4, 2015

I've seen 64-bit ints (on Cray systems).

One disadvantage of making int 64 bits, even if that's the natural size, is that if char is 8 bits, then short has to be either 16 or 32 bits (or 64) -- which means that you can't have predefined types covering all the common sizes (8, 16, 32, 64).

That's not quite true, since C99 introduced extended integer types -- but I don't know of any C compiler that has provided them.

(The intN_t and uintN_t types in <stdint.h> don't solve this; they still have to be defined in terms of existing types.)

ericfontaine · on March 4, 2015

thanks for clarification!

belovedeagle · on March 4, 2015

> (especially C programmers)

Author needs to stop his anti-intellectual everyone-is-as-ignorant-as-me bullshit. I see that a lot re programming to justify a lot of silly positions. If you only know Javascript, that's great, I rather like having shiny things in my browser (I like it too much, even). That doesn't mean that C programming is obsolete; some of us know C. For example, I got #5 and #9 wrong, and the rest I got right including the general idea of the justifications. (10/12 is pretty good for someone who grew up in the Java era, but I want to get better.)

sjolsen · on March 4, 2015

I don't believe the author is claiming that C is obsolete; nor do I believe he is claiming that no one knows C.

What I do think he's implying, if not outright claiming, is that there are many people who grossly overestimate their knowledge of the language. That doesn't mean there aren't a lot of very talented and knowledgeable C programmers— but there are a lot of people out there who think they're hot shit because they've done all the exercises in K&R, but who have no real familiarity with the formal semantics of standard C.

jflatow · on March 4, 2015

Glad I'm not the only one who didn't find anything here that would make me think I didn't know C.

oneeyedpigeon · on March 4, 2015

I think the opinions "c is too complicated and buggy and, therefore, obsolete" and "javascript is a 'shiny' language for amateurs" are just as narrow-minded as each other.

zamalek · on March 4, 2015

> javascript is a [...] language for amateurs

He never said that or implied it.

oneeyedpigeon · on March 4, 2015

"If you only know Javascript" - why even bring that up? There is no mention of Javascript in the original article at all.

icebraining · on March 4, 2015

Because it's a language that many people know exclusively?

belovedeagle · on March 4, 2015

In fact, my point was the complete opposite.

zamalek · on March 4, 2015

> the rest I got right including the general idea of the justifications.

The majority of them can be answered correctly by someone who understands computer architecture and programming languages. E.g. #3 is about pointers which do not necessarily have only to do with C. Even JS developers implicitly deal with pointers:

    var v = {} // v is a pointer

geocar · on March 4, 2015

That is not a pointer.

In C I can perform arithmetic on a pointer.

`v` is a symbol which references an object. It might be acceptable to refer to `v` as a reference if we're being sloppy.

JavaScript has a DataView[1] which could be used to implement what I think is a pointer, but I do not think that most JS developers have used it.

[1]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

zamalek · on March 4, 2015

As compiled by the JIT, `v` is a pointer. That is how it is implemented, `v` is not a copy of `{}`, it points to a location of memory that contains `{}`. A pointer is not defined as being "allowed to do arithmetic" it is defined as "points to a location in memory," no more, no less.

> In computer science, a pointer is a programming language object, whose value refers to (or "points to") another value stored elsewhere in the computer memory using its address. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer.[1]

We are not being sloppy, in fact we are being extremely correct and precise. Pointers and dereferencing pointers might be implicit and automatic in Javascript but that does not change that a pointer to a value is being used, instead of the value itself.

Pointer arithmetic is not the same thing as pointers, it is merely something you might be able to do with pointers if the language you are using supports it. Rust calls its "symbols that reference objects" (what on earth?) pointers, even though you are unable to do pointer arithmetic on them (unless you drop to unsafe). C++ Smart Pointers are called as such even though you can't do arithmetic on them.

[1]: http://en.wikipedia.org/wiki/Pointer_%28computer_programming...

seba_dos1 · on March 4, 2015

So from now on we should call every structure which is implemented inside library, compiler or JIT using a pointer, a "pointer"?

> In computer science, a pointer is a programming language object, whose value refers to (or "points to") another value stored elsewhere in the computer memory using its address.

You have your answer here. In JavaScript ("programming language"), v ("object") has a value of JavaScript structure called object. Internal representation of v sure does use a pointer, but from JavaScript perspective, it surely is NOT a pointer. If you get the value of v in JavaScript, you don't get memory address ("value [that] refers to (or "points to") another value stored elsewhere in the computer memory using its address") - you get the object itself. That's what references do, not pointers.

chadaustin · on March 4, 2015

The only one I got wrong was the one about IEEE semantics. It clearly is possible to know C. :P

dllthomas · on March 4, 2015

I'll readily admit that there are corners of C where I would not be able to tell you just what breaks. But I know about where they are and enough to stay away from them. If I'm not sure, the next guy won't be either.

I got 12/12 here. I would say I know C.

Veedrac · on March 4, 2015

A trickier variant of #11 is to show people

    bool is_zero(int x) {
        return x == -x;
    }

    bool is_zero(float x) {
        return x == -x;
    }

and ask them which is wrong and for what value. Most of the time the instinctive response is that it must be the float code (because floats are evil, duh).

This works even in languages with defined overflow for integers.

rav · on March 4, 2015

But you should give the functions different names, since it is not valid to define the same symbol more than once (and there's no name mangling in C that enables function overloading).

copsarebastards · on March 4, 2015

1.

    int i = 10;

Q. Is this code correct?

A. Yes.

2.

    extern void bar(void);
    void foo(int *x)
    {
      if(x == NULL)
      {
        return;
      }

      int y = *x;
      bar();
      return;
    }

Q. It turns out if you check the validity of your variables before you use them it prevents you from having to understand undefined behavior.

A. Is there a question here?

3. There was a function:

    #define ZP_COUNT 10
    void func_original(int *xp, int *yp, int *zp)
    {
      int i;
      for(i = 0; i < ZP_COUNT; i++)
      {
        *zp++ = *xp + *yp;
      }
    }

I optimized it this way:

...because nobody had any idea what it was doing and so it wasn't used anywhere.

4.

    double f(double x)
    {
      assert(x != 0.);
      return 1. / x;
    }

Q. Is it possible for this function to return inf?

A. If you're at the point where you're asking that question you should have been using a decimal library a long time ago.

    int my_strlen(const char *x)
    {
      int res = 0;
      while(*x)
      {
        res++;
        x++;
      }
      return res;
    }

Q: The provided above function should return the length of the null-terminated line. Find a bug.

A. They didn't use `strlen()`.

_RPM · on March 4, 2015

This reminds me of tests I took in earlier CS classes. Knowing those things are utterly useless in practice.

gamegoblin · on March 4, 2015

My CS101 prof would give 50-100 line blocks of code, and hidden in them somewhere would be something like:

    if(condition); {
        some stuff
    }

Note the semicolon after the if-statement.

The tests became games of "find the semicolon or = in place of ==". Ugh.

oneeyedpigeon · on March 4, 2015

Thankfully, your compiler probably warns about this nowadays, making such pointlessness obsolete.

ygra · on March 4, 2015

And the static analysis tool you're hopefully using.

cbsmith · on March 4, 2015

A lot of these are cases where I've seen actual bugs. Popular ones are >= 0 checks, dumb assumptions about overflows and null derefs, and the always popular "use an int when it should be size_t".

bgvopg · on March 4, 2015

What is your main programming language; in which you do most of the work?

_RPM · on March 4, 2015

C, actually, at the moment. But I don't use C for work or web development.

bgvopg · on March 4, 2015

I hope this thread gave you some idea how important these thing really are.

frozenport · on March 4, 2015

Compilers are increasingly optimizing out UB, recently fixed a problem where `std::unique_lock` was optimized as it was referenced from an `std::thread` defined at the top of the class. The future is now!

rtpg · on March 4, 2015

only useless if you never make a mistake ever. This shows what you should be vigilant for (alternatively , an argument for using a language with more checks)

As someone on HN once said (in jest), it's easy to write bug-free C, you just need to never make a mistake ever and spend a million hours auditing it.

icebraining · on March 4, 2015

If those mistakes are so teachable, wouldn't it be better to put them into a linter?

joosters · on March 4, 2015

They are in linters, and many of them will be thrown up as warnings by the compiler in the first place.

shele · on March 4, 2015

Regarding the first answer: What is called a "tentative definition" is of course a "declaration".

sirclueless · on March 4, 2015

That's not actually true in this case. The author is correct about this.

When "int i = 10;" is encountered, the tentative definition behaves effectively as a declaration. If the compiler were to reach the end of the translation unit and the variable i was never defined elsewhere, however, "int i;" serves as a definition.

shele · on March 4, 2015

In think it is the other way around - the declaration is a tentative definition, so a definition unless there is a definition at file scope.

ben0x539 · on March 4, 2015

Isn't a declaration "extern int i;"? Without the "extern", it's a definition, there's a symbol in that translation unit, etc.

varjag · on March 4, 2015

No, it has nothing do with scope qualifiers. This is the same mechanism that used e.g. to declare recurrent structures.

efaref · on March 4, 2015

Indeed. I guess his title is correct for him, at least.

belovedeagle · on March 4, 2015

Why the heck was the name of this post changed? It rather conveniently puts the author in a better light by downplaying the anti-intellectual nature of the article I commented about above. I thought the general rule was that posts should be titled with the title of the linked article, which was the case before but now is not the case.

EDIT: To answer my own question, the submitter is clearly the author based on his submission history. So yes, this was an act of self-censorship to try to hide the author's disgusting attitudes.

skazka16 · on March 4, 2015

Not exactly true. I publish articles, which we translate into English (mainly from Russian). The original author is Dmitri Gribenko and it seems he is not a member of HN community.

paranoid_much · on March 4, 2015

The title was changed by one of the admins, "dang", and he posted in this thread that the reason for the change was that the original title was "controversial".

kdoherty · on March 4, 2015

Unsigned int >= 0 in a decrementing for-loop is a classic trap

bgvopg · on March 4, 2015

Some would twitch at this:

for(i = length - 1; i < SIZE_MAX ; i--)

but it is completely valid.

joosters · on March 4, 2015

The twitchers are right. Whoever wrote that line of code is a smartass who is deliberately obfuscating stuff. Don't work with these people!

bgvopg · on March 4, 2015

I prefer to learn, rather than stay ignorant, when I find new tech.

The only people I would not want to work with are ignoramuses and ironically, smart-asses. So you.

anon4 · on March 4, 2015

just fucking use

  for(i = 0; i < length; i++)
    .. do something with (length - i - 1) ..

I've been much happier since I started writing all loops as incrementing loops.

cbsmith · on March 4, 2015

    for (i = 0; i <= length; i++)
        .. boy I sure hope length is not a max value ..

It's not about the incrementing vs. decrementing so much as the equality bounded loop on a value that could be a min/max value. Gets you every time.

astrange · on March 4, 2015

Assuming, of course, i is size_t and not int.

bgvopg · on March 4, 2015

My reply to kdoherty isn't assuming that. Read the parent carefully.

heinrich5991 · on March 4, 2015

I don't see it, can you explain why this works with `int`?

Ded7xSEoPKYNsDd · on March 4, 2015

It doesn't work for int, but kdoherty specified unsigned int.

seba_dos1 · on March 4, 2015

5 is IMO nothing else than nitpicking. 2 and 4 (and maybe 6) might have some importance in real life, while others are easy and kinda expected (although comma operator in 8 might not be known to less experienced programmers). I can't really see the point of this article other than "hey, do you remember that there is a concept called undefined behavior in C?".

ygra · on March 5, 2015

You'd be surprised how many C or C++ programmers don't know about UB. Some (I've worked with one of them) even went so far as to say »I know what assembly the compiler generates from that, even if it's UB I know how it behaves.« And those people then wonder that their code does something differently when moving to a new compiler version, or when switching to a different compiler.

nspattak · on March 4, 2015

I was so surprised by the first question that I actually tried it on my computer...

Am I the only one who tried compiling :

#include<stdio.h> #include<stdlib.h> int main(int argc, char *argv[]) { int i; int i=10; printf("i=%d\n", i); return 0; }

and got the redeclaration error I expected?

MikeTaylor · on March 4, 2015

The original article makes it clear that this is how things work with C globals, not locals. If you compile and run this program with gcc -Wall -pedantic x.c && ./a.out there will be no errors and it will emit i=10 as expected:

  int i;
  int i=10;
  #include<stdio.h>
  int main(int argc, char *argv[]) {
      printf("i=%d\n", i);
      return 0;
  }

kenjackson · on March 4, 2015

I just got the first one wrong and haven't written C code in a decade. I have to admit I haven't seen that construct used in any programs. I guess, no harm no foul, but still seems like an odd construct to allow.

foogoespop · on March 4, 2015

The first one would appear when using global variables which are shared across files, you just wouldn't see it as the pieces would usually be in separate files.

Something like:

    /* file.h */
    int global_i;

and you have:

    /* file.c */
    int global_i = 0;

Then if file.c includes file.h both will be in the same file during compilation.

ConAntonakos · on March 4, 2015

I realize this has probably been asked, but anyone recommend good resources for learning C? And not necessarily just the legendary textbooks, but any clever tutorials or fun learning resources? Thanks!

bottled_poe · on March 4, 2015

Does anyone know of a tool which can automatically scan source code for these types of oversights?

Assuming of course the compiler doesn't already check for all of these...

pcwalton · on March 4, 2015

You can use the clang sanitization flags to catch them dynamically (at least, the null pointer dereference and signed overflow).

halayli · on March 4, 2015

Knowledge is not absolute. Yes you can be a C expert and still not know every single little detail about C or any other language for that matter.

on March 4, 2015

[deleted]

cbsmith · on March 4, 2015

That's a curious definition of "recent years".

dang · on March 4, 2015

Since the title seems to be controversial we changed it to a neutral one.

fizixer · on March 4, 2015

He's discussing subtle points of the C language and yet there is no mention of which compiler he's using, whether the results might be different for different compilers, hardware platforms, and how they correlate with multiple C standards (C89, C99, C11/C1X, etc).

He succeeded in convincing me that he does not know C!

sjolsen · on March 4, 2015

There is no mention of which compiler he's using or whether the results might be different for different compilers and/or hardware platforms precisely because he's discussing the C language— not the idiosyncrasies of particular implementations.

>and how they correlate with multiple C standards

There's nothing going on here specific to any particular revision of ISO C (that I can see).

fizixer · on March 4, 2015

Too much of a coincidence!

Did he make sure the behavior he mentioned is uniform across all ISO standards? What is his source for coming up with a certain answer to a certain piece of code? any of the standard documents? K&R book? gcc output?

As an example:

In the answer to point 2, he claims:

> ... the compiler thinks that x cannot be a null pointer ...

First of all, this gives a strong indication that he's analyzing a compiler output, a compiler that he didn't reveal in the article.

But even if we ignore that, and he truly is going by a rule that is uniform across all of K&R, C89, C99, and so on, could you or him point me to any page in the C99 standard document where it is explicitly stated along the lines that the compiler "should assume" the pointer to be not NULL after an undefined dereferencing in line (1) and hence ignore (2) and (3)? (Based on my experience, I have a very strong hunch that a standard would not enforce assumptions as a result of an undefined operation.)

If you could, you/he "may" have a point ("may", because I still have 11 other points to critique). If not, he and you clearly have no idea what you guys are talking about!

wtallis · on March 4, 2015

The questions are about the C language, not any particular implementation thereof. Question 2 requires you to know that different compilers may at times legally do different things with the same piece of code. Answering question 2 does not require you to know which compiler is in use. Instead it requires you to think in the mindset of someone trying to write portable code that will work as intended with any compliant compiler.

In this case, the code is invalid because it invokes undefined behavior, and the compiler is allowed to do literally anything. The author of a portable C program is not allowed to rely on any particular behavior. I don't have any of the ISO C standards docs handy, but I'm quite certain that they all agree here. The (probably hypothetical) compiler in use here is apparently trying to apply several heuristics that are useful in other situations but fail here because there is no right answer.

In general, the C standards avoid requiring a specific behavior where choosing to require a specific behavior could hurt portability. Most C compilers make use of their leeway to interpret things in a manner that helps make bad code run and many compilers make promises beyond those required by the standard to aid in writing non-portable code.

fizixer · on March 4, 2015

> ... the compiler is allowed to do literally anything ...

and yet he makes the claim that "bar() is invoked" which is not only incorrect (bar may or may not be invoked), but also misleading for C newbies who are actually trying to learn something by reading this article. Hence my original comment.

Gibbon1 · on March 4, 2015

I think my point is dereferencing a pointer address zero is 'undefined' but here is the rub, what happens is entirely out of the optimizer and compilers control because it depends on the context that the program is run under. One can't determine at compile time what will happen.

In fact the compiler can't make any assumptions about a pointer whose value has been hard coded.

uint8_t b = (uint8_t) 0x30; // 0x30 is the address of PORT D on an AVR Atmega8

wtallis · on March 4, 2015

"bar() is invoked" is not incorrect. It's perfectly legal for a compiler to produce this result. It's just not mandatory that all compilers do so. There's nothing wrong with the question postulating that a particular compiler behaves this way; the point of the question is to remind the programmer that they have to expect and be prepared for variance between compilers when it's permitted by the standard.

lmm · on March 4, 2015

You're misunderstanding point 2. The question is to be read as: suppose an optimizing compiler does this. Has that compiler violated the standard? You don't need to know which compiler (or ever whether it's a real compiler or just one that's been made up for the question) to answer that.

sjolsen · on March 4, 2015

>Did he make sure the behavior he mentioned is uniform across all ISO standards?

I don't know.

>What is his source for coming up with a certain answer to a certain piece of code?

His conclusions are consistent with my understanding of the C standard. His justifications for his conclusions refer to specific rules regarding program behaviour, which suggests he's using the standard(s).

>First of all, this gives a strong indication that he's analyzing a compiler output, a compiler that he didn't reveal in the article.

He is using the output of some compiler to illustrate the potential consequences of making the given mistake. It doesn't really matter which; the point is that it is not legal to perform lvalue-to-rvalue conversion on the result of indirecting through a null pointer.

>could you or him point me to any page in the C99 standard document where it is explicitly stated along the lines that the compiler "should assume" the pointer to be not NULL after an undefined dereferencing in line (1) and hence ignore (2) and (3)?

No, because the C99 standard document does not say that. What it does say is effectively that the compiler MAY assume the pointer not to be null; more generally, the compiler is allowed to assume that the program never exhibits undefined behaviour.

fizixer · on March 4, 2015

> ... MAY assume ...

and therefore his claim:

> ... Turns out, bar() is invoked even when x is the null pointer ...

is incorrect (bar may or may not be invoked) and misleading for C newbies, without any mention of the subtleties of standards and implementations. Hence my original comment.

Dylan16807 · on March 4, 2015

You're misreading a descriptive statement as normative. It could have been clearer but it's not wrong.

TheCoelacanth · on March 4, 2015

> could you or him point me to any page in the C99 standard document where it is explicitly stated along the lines that the compiler "should assume" the pointer to be not NULL after an undefined dereferencing in line

You won't find a line that states it explicitly, but the standard does allow it.

If the pointer is null, de-referencing it invokes undefined behavior, so the program is allowed to do literally anything. That includes doing whatever it would have done if the pointer wasn't null. So the compiler is allowed to assume that the pointer is not null.

obstinate · on March 4, 2015

Since he's talking about undefined behavior in a lot of the cases, the answer to all your questions is, "Yes." That's pretty obvious, so maybe he thought it went without saying.

fizixer · on March 4, 2015

Not sure which of my questions you're referring to.

As I replied to others, saying "bar() is invoked" is not only not obvious, it's incorrect. bar() may or may not be invoked.

heinrich5991 · on March 4, 2015

That's correct, what he wanted to say is probably that there exists compilers where `bar()` is invoked. Don't be so snarky, try to understand what the author ment.

Gibbon1 · on March 4, 2015

It's up to the author to show us this mythical compiler.

Dylan16807 · on March 4, 2015

It's up to the author to show that it's a valid compiler, no more. The question is whether a behavior is valid, not whether it is mandatory.

Odd of you to use 'mythical' when one of the most common compilers in existence does this.

valleyer · on March 4, 2015

You are misparsing. He's effectively saying: imagine you are testing this code, and x == NULL but "bar() is invoked". Is the compiler buggy?

_kst_ · on March 4, 2015

Do you see anything that depends on which edition of the C standard is being used? I don't.

fdik · on March 4, 2015

Already the first example is wrong. So I stopped reading.

% cc -std=c11 -o dingens dingens.c dingens.c:6:9: error: redefinition of 'i' int i = 10; ^ dingens.c:5:9: note: previous definition is here int i; ^ 1 error generated. % cat dingens.c

#include <stdio.h>

int main() { int i; int i = 10;

    printf("hello, world\n");
    return 0;

}

%

anant · on March 4, 2015

He clearly mentions that the statements are not in a function body, rather just in the global portion of a C file.

azinman2 · on March 4, 2015

That's because you put it inside a function versus inside the file outside of a function. It'll compile then, even on clang with -std=c11. I just tried it now and it worked.

From his webpage: "Reminding you that it’s a separate source file and not a part of the function body or compound statement"