There's a compiler attribute in GCC to promise that a function is pure, i.e. fre...

kazinator · on April 14, 2020

No enforcing! This is useful even when it's, strictly speaking, a lie.

Suppose I want to add some debug tracing into f():

   f.c: 42: f entered
   f:c: 43: returning 2

that's a side effect, right? But now the pure attribute tells a lie. Never mind though; I don't care that some calls to f are "wrongly" optimized away; I want the tracing for the ones that aren't.

In C++ there are similar situations involving temporary objects: there is a freedom to elide temporary objects even if the constructors and destructors have effects.

Even a perfectly pure function can have a side effect, namely this one: triggering a debugger to stop on a breakpoint set in that function!

If a call to f(2) is elided from some code, then that code will no longer hit the breakpoint set on f.

Side effect is all P.O.V. based: to declare something to be effect-free in a conventional digital machine, you have to first categorize certain effects as not counting.

flatfinger · on April 15, 2020

Such attributes would be most useful if the semantics were that any time after a program receives inputs that would cause a "pure" function to be called with certain arguments, a compiler may at its leisure call the function with those arguments as many or as few times as it sees fit.

The notion that "Undefined Behavior" is good for optimization is misguided and dangerous. What is good for optimization is having semantics that are loose enough to give the compiler flexibility in how it processes things, but tight enough to meet application requirements.

Instead of saying that compilers can do anything they want when their assumptions are violated, it would be far more useful to recognize what they are allowed to do on the basis of certain assumptions. For example, given a piece of code:

    long long test1(long long x, int mode)
    {
      while(x)
        x = slow_function_no_side_effects(x);
      return x;
    }

    void long test2(long long x, int mode)
    {
      x = test1(x);
      if (!mode)
        x=0;
      doSomething(x);
    }

It would generally be useful and safe to allow a compiler that determines that no individual action performed by "test1()" could have any side effects may omit the call to "test1()" if its value never ends up being used, without having to prove that the slow function with no side effects will eventually return zero. It is likewise useful and safe to say that if the generated code observes either that the loop exits or that "mode" is zero, it may replace the call "doSomething(x)" with "doSomething(0)". The fact that both optimizations would be safe and useful individually, however, does not imply that it would be safe and useful to allow compilers to change the code for "test2()" so that it calls "doSomething(0)" or otherwise allow code to observe that the value of "x" is zero when mode is non-zero, without regard for whether "test1()" would complete.

kazinator · on April 15, 2020

> flatfinger

https://news.ycombinator.com/user?id=supercat

?

If you contact the HN gods maybe there is a way to recover access to that account.

gbear605 · on April 14, 2020

Just offer a -Wpure flag for checking if functions are pure. That way production/test releases can check while you can still use it for debugging.

Also, the problem with eliding breakpoints already exists afaik, since the compilers already check for pure functions.

pascal_cuoq · on April 14, 2020

If you wrote down your proposal, which the C committee member Robert Seacord is encouraging you to do here: https://news.ycombinator.com/item?id=22870210 , you would have to think carefully about functions that are pure according to your definition (free from side effects and only uses its inputs) but do not terminate for some inputs.

There is at least one incorrect optimization present in Clang because of this (function that has no side-effects detected as pure, and call to that function omitted from a caller on this basis, when in fact the function may not terminate).

temac · on April 14, 2020

I thought the compiler was free to pretend loops without side effects always terminate, and in that sense it is already a "correct" optimization? Or is it only for C++, I'm not sure?

pascal_cuoq · on April 14, 2020

That may be the case in C++, but in C infinite loops are allowed as long as the controlling condition is a constant expression (making it clear that the developper intends an infinite loop). These infinite loops without side-effects are even useful from time to time in embedded software, so it was natural for the committee to allow them: https://port70.net/~nsz/c/c11/n1570.html#6.8.5p6

And you now have all the details of the Clang bug, by the way: write an infinite loop without side-effects in a C function, then call the function from another C function, without using its result.

oh_sigh · on April 14, 2020

sum = 2*f(2) seems nicer than having sum= twice.

If you were enforcing this with the compiler, you would also need something that would suppress the enforcing, because the millions of pre-existing functions would probably not get an updated attribute marking it as pure. And once you do that, the compiler can't really trust anything that function does, because it may actually be calling a non-pure function.