[LLVMdev] Proposal: Add a guaranteed tail call marker

Tue Apr 1 11:03:19 PDT 2014

On 31 March 2014 22:35, Reid Kleckner <rnk at google.com> wrote:
> Some frontends for LLVM require that LLVM perform tail call optimization
> (TCO) for correctness.  Internally, LLVM refers to TCO of a non-recursive
> tail call as sibling call optimization, but I'm going to refer to that
> generically as TCO.  Often, functional languages like Scheme have a
> language-level requirement that TCO occurs for any call in the tail
> position, and this is usually why users of LLVM have asked for guaranteed
> TCO functionality.
>
> In my case, to implement vtable thunks and virtual member pointers in the
> IA32 Microsoft C++ ABI, I cannot simply forward the arguments to the callee
> without calling the copy constructor.  If I can use a guaranteed tail call,
> I don't have to emit copy constructor calls, and things are much easier.
>
> Currently, in order to get guaranteed TCO, frontends have to enable the
> GuaranteedTailCall codegen option and obey a narrow set of rules, which
> includes always using fastcc.  This is fairly awkward and doesn't solve my
> use case, since the ABI requires a particular convention.
>
> Instead, I propose that we add a new tail call marker, 'musttail', that
> guarantees that TCO will occur.  I'm open to other naming suggestions.  Some
> strawmen are 'tailonly' or 'guaranteedtail'.  Along with it, I propose a set
> conservative of verifier enforced rules to follow to ensure that most
> reasonable backends will be able to perform TCO.  It also ensures that
> optimizations, like tail merging, don't accidentally move a call out of tail
> position.
>
> First, the prototype of the caller and the callee must be "congruent".  Two
> prototypes are congruent if all return and parameter types match except for
> the pointer types.  The pointer types can have different pointee types, but
> they must have the same address space.  In addition, all the ABI impacting
> attributes on the parameters must be the same, meaning byval, inalloca,
> inreg, etc, must all match up.
>
> Second, the call must be in tail position.  The call must be immediately
> followed by a bitcast or a ret, both of which must use the result of the
> call.  If there is a bitcast, it must be followed by a ret which uses the
> bitcast.
>
> Importantly, LLVM must perform TCO regardless of the computation necessary
> to compute the arguments for the callee, which is not the case today.
>
> I sent a patch to llvm-commits, but I'd like to hear high-level feedback on
> llvmdev:
> http://llvm-reviews.chandlerc.com/D3240

It would even be invalid to introduce calls to the copy constructor,
no? I don't think the thunks exist at the c++ standard level, so we
cannot create extra copy constructor calls for them. In other words,
we need something like what you are proposing.

Having this also puts us on the path to improve itanium codegen for

struct A {
  virtual void f(int x, ...);
};
struct B {
  virtual void f(int x, ...);
};
struct C : A, B {
  virtual void c();
  virtual void f(int x, ...);
};
void C::f(int x, ...) {  }

Currently we have clang duplicate the body of C::f.  We cannot create
a normal copy since we cannot represent the copy of the varargs, even
when we want llvm to eliminate those copies. We could use a inalloca
argument for the rest of the frame instead of a "...". Since we have a
guarantee that no copy will be done, we don't have to know the size.

Cheers,
Rafael