[llvm-dev] Noncapture use of locals disabling TailRecursionElimination

Nick Lewycky via llvm-dev llvm-dev at lists.llvm.org
Fri May 8 16:38:22 PDT 2020


On 2020-05-08 2:58 p.m., Xun Li wrote:
> Eli,
> Yes I was referring to AllCallsAreTailCalls. I will take a look at how
> to improve this.
>
> Nick,
> Thanks. I agree that's the proper constrain to mark a call as
> tailcall, however not being able to mark a call as tailcall shouldn't
> completely kill TCE. (i.e. AllCallsAreTailCalls seems overly
> limiting).

I get it now. TailRecursionElimination.cpp does two optimizations, 
marking of tail and converting recursion to loops. I thought you were 
proposing a change to the marking of tail. Thanks for the example!

"PR962" refers to "problem report 962" or https://llvm.org/PR962 . 
"rdar" is Apple's "radar" bug tracking system, these are generally 
internal to their company but sometimes they're available in whole or in 
part on https://openradar.appspot.com/ .

Nick

> Specifically, I have been looking into an issue where Clang cannot TCE
> the following code:
>
> class Foo {
> public:
>    __attribute__((noinline))
>    ~Foo() {}
> };
>
> void callee() {
>    Foo f;
> }
>
> void funcRecurse() {
>    callee();
>    funcRecurse();
> }
>
> int main() {
>    funcRecurse();
> }
>
> In the above code, callee will be inlined into funcRecurse, generating
> calls to ~Foo() along with lifetime management intrinsics, all using
> locals, disabling TCE. Clang can successfully TCE if we force callee
> to not inline.
>
> On Fri, May 8, 2020 at 1:53 PM Nick Lewycky <nicholas at mxc.ca> wrote:
>> On 2020-05-08 1:34 p.m., Xun Li wrote:
>>> Hi,
>>>
>>> I was looking into the implementation of TailRecursionElimination, and
>>> noticed that we have the constrain that if any call uses a local, even
>>> though it doesn't capture the local, it would still prohibit TCE. This
>>> contain seems unnecessary and overly limiting?
>> I think it's a necessary limitation. The idea is that something marked
>> 'tail' could be emitted as a jump instruction instead of a call
>> instruction. In order to do that, the jumpee has to 'ret' to the same
>> place our function would have returned to. Since the return value is the
>> current top-of-the-stack value when returning[1], the jumper must remove
>> any local variables -- in LLVM, allocas -- from the stack before
>> jumping. This implies that the jumpee may not access our locals at all,
>> it is not sufficient for the callee to merely not capture them.
>>
>> Further to that point, we have an optimization in BasicAA where we
>> conclude that a call marked "tail" does not alias any allocas of the
>> calling function.
>>
>> There's a missed optimization here regarding "notail". Because "tail"
>> can be used as an optimization for BasicAA, we should actually permit
>> functions to be marked "tail" (not accessing any of the callers stack)
>> and still be marked "notail" (the machine code must be a call, not a
>> jump). These aren't actually mutually exclusive, even though they sound
>> like they are.
>>
>> Nick
>>
>> [1] - not true on Windows, but even there the number of bytes deep the
>> return value is, is a fixed constant that is determined entirely by the
>> callee. The jumper can't have extra bytes on the stack at the moment of
>> the jump.
>>
>>>    Relevant code is here:
>>> https://github.com/llvm/llvm-project/blob/cbe77ca9bd05393b1df1abf016b01f44d1d10a49/llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp#L275
>>>
>>> Looking through the tests that cover this scenario
>>> (https://github.com/llvm/llvm-project/blob/e29874eaa04d24b8c67776bf5729474d671a58f6/llvm/test/Transforms/TailCallElim/basic.ll#L66),
>>> I found it referring to rdar://14324281 and PR962. What are they?
>>> These haven't been updated since 2014, so I wonder what is the latest
>>> state and have them been resolved?
>>>
>>> cc nicholas who authored the original code.
>>>
>>> Thanks!
>>>
>



More information about the llvm-dev mailing list