<div dir="ltr">Some frontends for LLVM require that LLVM perform tail call optimization (TCO) for correctness.  Internally, LLVM refers to TCO of a non-recursive tail call as sibling call optimization, but I'm going to refer to that generically as TCO.  Often, functional languages like Scheme have a language-level requirement that TCO occurs for any call in the tail position, and this is usually why users of LLVM have asked for guaranteed TCO functionality.<div>

<br></div><div>In my case, to implement vtable thunks and virtual member pointers in the IA32 Microsoft C++ ABI, I cannot simply forward the arguments to the callee without calling the copy constructor.  If I can use a guaranteed tail call, I don't have to emit copy constructor calls, and things are much easier.<div>

<br></div><div>Currently, in order to get guaranteed TCO, frontends have to enable the GuaranteedTailCall codegen option and obey a narrow set of rules, which includes always using fastcc.  This is fairly awkward and doesn't solve my use case, since the ABI requires a particular convention.<div>

<br></div><div>Instead, I propose that we add a new tail call marker, 'musttail', that guarantees that TCO will occur.  I'm open to other naming suggestions.  Some strawmen are 'tailonly' or 'guaranteedtail'.  Along with it, I propose a set conservative of verifier enforced rules to follow to ensure that most reasonable backends will be able to perform TCO.  It also ensures that optimizations, like tail merging, don't accidentally move a call out of tail position.</div>

<div><br></div><div>First, the prototype of the caller and the callee must be "congruent".  Two prototypes are congruent if all return and parameter types match except for the pointer types.  The pointer types can have different pointee types, but they must have the same address space.  In addition, all the ABI impacting attributes on the parameters must be the same, meaning byval, inalloca, inreg, etc, must all match up.</div>

</div></div><div><br></div><div>Second, the call must be in tail position.  The call must be immediately followed by a bitcast or a ret, both of which must use the result of the call.  If there is a bitcast, it must be followed by a ret which uses the bitcast.</div>

<div><br></div><div>Importantly, LLVM must perform TCO regardless of the computation necessary to compute the arguments for the callee, which is not the case today.</div><div><br></div><div>I sent a patch to llvm-commits, but I'd like to hear high-level feedback on llvmdev:</div>

<div><a href="http://llvm-reviews.chandlerc.com/D3240">http://llvm-reviews.chandlerc.com/D3240</a><br></div><div><br></div><div>Thanks!</div></div>