[llvm-dev] Noncapture use of locals disabling TailRecursionElimination

Xun Li via llvm-dev llvm-dev at lists.llvm.org
Fri May 8 14:58:59 PDT 2020


Eli,
Yes I was referring to AllCallsAreTailCalls. I will take a look at how
to improve this.

Nick,
Thanks. I agree that's the proper constrain to mark a call as
tailcall, however not being able to mark a call as tailcall shouldn't
completely kill TCE. (i.e. AllCallsAreTailCalls seems overly
limiting).

Specifically, I have been looking into an issue where Clang cannot TCE
the following code:

class Foo {
public:
  __attribute__((noinline))
  ~Foo() {}
};

void callee() {
  Foo f;
}

void funcRecurse() {
  callee();
  funcRecurse();
}

int main() {
  funcRecurse();
}

In the above code, callee will be inlined into funcRecurse, generating
calls to ~Foo() along with lifetime management intrinsics, all using
locals, disabling TCE. Clang can successfully TCE if we force callee
to not inline.

On Fri, May 8, 2020 at 1:53 PM Nick Lewycky <nicholas at mxc.ca> wrote:
>
> On 2020-05-08 1:34 p.m., Xun Li wrote:
> > Hi,
> >
> > I was looking into the implementation of TailRecursionElimination, and
> > noticed that we have the constrain that if any call uses a local, even
> > though it doesn't capture the local, it would still prohibit TCE. This
> > contain seems unnecessary and overly limiting?
>
> I think it's a necessary limitation. The idea is that something marked
> 'tail' could be emitted as a jump instruction instead of a call
> instruction. In order to do that, the jumpee has to 'ret' to the same
> place our function would have returned to. Since the return value is the
> current top-of-the-stack value when returning[1], the jumper must remove
> any local variables -- in LLVM, allocas -- from the stack before
> jumping. This implies that the jumpee may not access our locals at all,
> it is not sufficient for the callee to merely not capture them.
>
> Further to that point, we have an optimization in BasicAA where we
> conclude that a call marked "tail" does not alias any allocas of the
> calling function.
>
> There's a missed optimization here regarding "notail". Because "tail"
> can be used as an optimization for BasicAA, we should actually permit
> functions to be marked "tail" (not accessing any of the callers stack)
> and still be marked "notail" (the machine code must be a call, not a
> jump). These aren't actually mutually exclusive, even though they sound
> like they are.
>
> Nick
>
> [1] - not true on Windows, but even there the number of bytes deep the
> return value is, is a fixed constant that is determined entirely by the
> callee. The jumper can't have extra bytes on the stack at the moment of
> the jump.
>
> >   Relevant code is here:
> > https://github.com/llvm/llvm-project/blob/cbe77ca9bd05393b1df1abf016b01f44d1d10a49/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp#L275
> >
> > Looking through the tests that cover this scenario
> > (https://github.com/llvm/llvm-project/blob/e29874eaa04d24b8c67776bf5729474d671a58f6/llvm/test/Transforms/TailCallElim/basic.ll#L66),
> > I found it referring to rdar://14324281 and PR962. What are they?
> > These haven't been updated since 2014, so I wonder what is the latest
> > state and have them been resolved?
> >
> > cc nicholas who authored the original code.
> >
> > Thanks!
> >
>


-- 
Xun


More information about the llvm-dev mailing list