[LLVMdev] Proposal: Add a guaranteed tail call marker

Mon Mar 31 19:35:46 PDT 2014

Some frontends for LLVM require that LLVM perform tail call optimization
(TCO) for correctness.  Internally, LLVM refers to TCO of a non-recursive
tail call as sibling call optimization, but I'm going to refer to that
generically as TCO.  Often, functional languages like Scheme have a
language-level requirement that TCO occurs for any call in the tail
position, and this is usually why users of LLVM have asked for guaranteed
TCO functionality.

In my case, to implement vtable thunks and virtual member pointers in the
IA32 Microsoft C++ ABI, I cannot simply forward the arguments to the callee
without calling the copy constructor.  If I can use a guaranteed tail call,
I don't have to emit copy constructor calls, and things are much easier.

Currently, in order to get guaranteed TCO, frontends have to enable the
GuaranteedTailCall codegen option and obey a narrow set of rules, which
includes always using fastcc.  This is fairly awkward and doesn't solve my
use case, since the ABI requires a particular convention.

Instead, I propose that we add a new tail call marker, 'musttail', that
guarantees that TCO will occur.  I'm open to other naming suggestions.
 Some strawmen are 'tailonly' or 'guaranteedtail'.  Along with it, I
propose a set conservative of verifier enforced rules to follow to ensure
that most reasonable backends will be able to perform TCO.  It also ensures
that optimizations, like tail merging, don't accidentally move a call out
of tail position.

First, the prototype of the caller and the callee must be "congruent".  Two
prototypes are congruent if all return and parameter types match except for
the pointer types.  The pointer types can have different pointee types, but
they must have the same address space.  In addition, all the ABI impacting
attributes on the parameters must be the same, meaning byval, inalloca,
inreg, etc, must all match up.

Second, the call must be in tail position.  The call must be immediately
followed by a bitcast or a ret, both of which must use the result of the
call.  If there is a bitcast, it must be followed by a ret which uses the
bitcast.

Importantly, LLVM must perform TCO regardless of the computation necessary
to compute the arguments for the callee, which is not the case today.

I sent a patch to llvm-commits, but I'd like to hear high-level feedback on
llvmdev:
http://llvm-reviews.chandlerc.com/D3240

Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140331/71db259e/attachment.html>