Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> either TCO is a defining feature of the language, or a best-effort optimization that nobody should rely on.

I think a third option is that tail calls are only guaranteed if you add a special attribute (eg. [[musttail]]) and that attribute fails to compile if your code is written in such a way that a true tail call cannot be guaranteed.

The LLVM backend supports a "musttail" marker on function calls (https://llvm.org/docs/LangRef.html#call-instruction), but it specifies a set of constraints that must be met for "musttail" to be valid. I have been experimenting with a Clang change that would plumb a C++ [[musttail]] attribute through the C++ frontend to LLVM, but it requires some care to make sure we satisfy LLVM's constraints.



I nominate the syntax

    goto function(parameter1, parameter2);
Also, may I suggest that it runs destructors prior to the jump rather than throwing up it's hands and saying you can't TCO because you have destructors?


I don't think it will be a new syntax, just an attribute on existing syntax, eg.

    [[clang:musttail]] f(a, b);
or:

    [[clang:musttail]] return f(a, b);
> Also, may I suggest that it runs destructors prior to the jump rather than throwing up it's hands and saying you can't TCO because you have destructors?

I don't think that is feasible, it would change the semantics of the language. You can always put your destructors in an inner scope if you want them to run first:

int Func() { { TypeWithDestructor a; int ret = a.get(); } [[clang:musttail]] return ret; }


What about the destructors for the temporaries used in the computation or the function call parameters, or the parameters themselves?


Handle it the same way as a return I think. Evaluate the temporaries, move / copy into the right place, run destructors, jump? Copy elision might be hard to pull off I suppose.

Edit: Hmm there is an issue in handling the stack space I suppose, because there are live old parameter objects already in the place you want to put your new parameters. An ugly but possibly workable way of handling it is to have two areas for parameters. One used for odd recursion depths and another used for even ones.


Then you changed the semantics of the code. Now a program that e.g. prints a message in its destructor, which gives this output:

  Message 1
  Message 2
  Message 3
Will give this output:

  Message 3
  Message 2
  Message 1
Your odd/even fix also won't work, because a function may push an argument to a list, and pass the list in another argument, making the original temporary accessible from any recursion level.

A call to a destructor is a normal call. If you need to call destructors, then it means your tail call is a call to a destructor and not what you're seeing in the code. It's a fundamental semantical problem. You can't have "make the last call in this function be an implicit call to some function" and "make the last call be the explicit call to this other function" at the same time.


If it's a different syntax, then you can expect people to know that it's semantically different. That's the whole point of using a different syntax, I think.


I thought that the point of this syntax was to make the compiler warn you when, what you think is a tail call, in fact isn't one, and demand that the compiler optimizes it; not introduce a completely new control flow construct.


> push an argument to a list, and pass the list in another argument, making the original temporary accessible from any recursion level.

Either you push a copy of the object to the list, in which case it is fine. Or you push a pointer/reference to the object, in which case it becomes dangling.


Exactly. It wouldn't be dangling without the transformation you proposed.


Parent comment only proposed a syntax. From grandparent comment:

> fails to compile if your code is written in such a way that a true tail call cannot be guaranteed


Make it an error unless all parameters passed are destructor-free (because there is non like int/pointer, or because they were passed in by reference)


That doesn't help, the parameter might be invalidated by a destructor (e.g. passing foo.c_str(), where foo is a std::string on the stack)


Then it isn't destructor-free (and not passed by reference if it's on the stack).


No. Arguments that are trivial types can still depend on other objects not being destroyed.

  void foo(const char* str) {
    if (*str == '\0') return;
    printf("%c", str[0]);
    std::string x = str + 1;
    return foo(x.c_str());
  }


This syntax is used by Perl; perversely, if you use it you have to dereference the function.


Speaking of TCO constraints, all of the common C/C++ calling conventions[0] have a fixed size stack cleanup. Some are caller-cleanup and some are callee-cleanup, but they all have amounts of stack cleanup that are constant (cdecl,stdcall,thiscal, etc.) or at least fixed at call time (varargs).

This means that TCO can't be done across calls where the amount of stack space used for arguments in the callee is larger than that of the caller. (In cases where the stack space used by the callee is less than the caller, the caller just needs to leave "wasted" stack space as if amount of stack space used by arguments were the same.) It wouldn't be much of a point on architectures where the cdecl calling convention passes enough arguments in registers to cover the majority of functions, except that some of these ABIs (notably Windows x64 calling convetion, but not Linux x64_64 SysV ABI) require the caller to allocate shadow space on the stack for all register-passed arguments. (Edit: I was wrong, the Windows shadow space on the stack is a fixed 32 bytes, regardless of the number of register-passed arguments.)

This motivates a couple of ABI questions:

1. Why does the Windows x64 ABI require the caller to pre-allocate "shadow space" to potentially spill register-passed arguments? It's wasteful if it's not needed (especially in ABIs with a redzone), and it reduces opportunities for TCO. (Edit: ahh, unlike Linux, the Windows x64 calling convention has no redzone. I guess this then becomes "Why doesn't Windows x64 provide a redzone?")

2. Why not define a calling convention that is callee-cleanup where the post-cleanup stack pointer is passed in a designated register (or at the top of the stack) to the callee? I understand that it might not make sense to pay the cost (fewer arguments passed in registers, and often an extra stack spill) for the majority of functions, but it seems an oversight that there's not a calling convention that (in the absence of destructors) always allows tail calls to be optimized.

I guess the answer to both questions is that most TCO opportunities are within a single DLL, so the compiler is free to create non-exported versions of functions with custom calling conventions. Is this right?

[0] https://en.wikipedia.org/wiki/X86_calling_conventions plus their variants on other architectures


> I guess the answer to both questions is that most TCO opportunities are within a single DLL, so the compiler is free to create non-exported versions of functions with custom calling conventions. Is this right?

LLVM mostly follows this path: it allows full TCO only when using the “fastcc” (i.e. “whatever LLVM feels like generating”) and some calling conventions from functional languages (like GHC) that have a specific carve out for it.


Correct me if I'm saying something stupid, but:

Shouldn't TCE be trivial with callee cleanup?

When I (the function) get called, I get X bytes of stack in arguments, then start pushing variables and whatnot. When I return, I simply put my return value at the start of my stack space, pop everything else, then jump back.

In a tail call, all I need to do is put the next functions arguments at the start of my stack space, pop everything else, then jump to its address.

Obviously in IRL CPUs some of this data would be kept in registers instead of the stack, but the mechanism should be the same.


You're correct, I did overgeneralize in the case of callee-cleanup calling conventions. Though, I'm still not aware of any callee-cleanup variadic calling conventions. printf, execl, etc. doen't actually get passed anything that directly indicates the number/size of arguments passed.

In the article's printf example, the compiler is able to perform TCO only because it uses at most 5 pointer/integer arguments (with number of fp arguments passed in al) and at most 6 fp arguments, so it's caller-cleanup, but with no cleanup to perform.

Under SvsV (Linux, etc.) x86_64 ABI, varargs calls, the number of fp arguments passed in XMMx registers is passed in al, but nothing to directly indicate how much stack space is used by arguments (and it's also obviously caller-cleanup). The printf example from the article works because all of the arguments fit in registers.


This is what OCaml does: you can annotate function calls with [@tailcall] to result in compiler diagnostics if the call in fact isn't in tail call position.


Scala too with @tailrec


tailrec is more restricted: it only supports simple tail recursion (i.e. calling the same function in tail position) that can trivially be rewritten into a loop. Scala isn’t really in a position to optimize mutual tail calls, since those don’t in general collapse to loops.


Alternatively, compilers could generate warnings when they find something like `return some_func(args...)`, and can't eliminate the tail-call because of something that happens elsewhere in the code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: