[LLVMdev] Small problem with the tail call elimination pass

Mon May 12 13:22:49 PDT 2014

Nicola Gigante wrote:
>
> Il giorno 09/mag/2014, alle ore 01:00, Nick Lewycky <nlewycky at google.com
> <mailto:nlewycky at google.com>> ha scritto:
>
>> At the moment TRE runs, the function reads:
>>
>> ; Function Attrs: nounwind readnone
>> define i32 @fib(i32 %n) #0 {
>> entry:
>> %cmp = icmp ult i32 %n, 2
>> br i1 %cmp, label %cond.end, label %cond.false
>>
>> cond.false: ; preds = %entry
>> %sub = add i32 %n, -1
>> %call = tail call i32 @fib(i32 %sub)
>> %sub1 = add i32 %n, -2
>> %call2 = tail call i32 @fib(i32 %sub1)
>> %add = add i32 %call, %call2
>> br label %cond.end
>>
>> cond.end: ; preds = %entry, %cond.false
>> %cond = phi i32 [ %add, %cond.false ], [ %n, %entry ]
>> ret i32 %cond
>> }
>>
>> TRE starts at the 'ret' and looks upwards and checks all of that
>> block's predecessors to find any that end in unconditional branches.
>> Then it walks upwards in such blocks to find if there's a "TRE
>> candidate" which is a call to itself (but aborts if there are
>> instructions is can't handle between the call and the end of the
>> block), and manages to find %call2 -- and calls
>> EliminateRecursiveTailCall on it. Good job! At this stage, it
>> transforms the "br label %cond.end" into a "ret i32 %add" and updates
>> the phi in %cond.end. There are now *two* returns, the one in
>> %cond.end is "ret i32 %n".
>>
>> EliminateRecursiveTailCall detects that TRE'ing this requires
>> performing accumulator recursion elimination. To do this, it checks
>> that "the function containing the specified tail call consistently
>> returns the same runtime-constant value" (at all exit points except
>> for our ret at the end of %cond.false). Now that we have a ret i32 %n,
>> this is false; the safety check finds that the argument for %n can
>> change because we pass in %sub and %sub1 into ourselves instead of
>> always passing %n to ourselves. That's what stops the transformation.
>>
>> Your working example makes two modifications, you change "n < 2" to "n
>> <= 2" and you change "? n" to "? 1". It's the second of these that
>> will make the difference, because it changes %cond to:
>>
>> %cond = phi i32 [ %add, %cond.false ], [ 1, %entry ]
>>
>> Everything proceeds the same way as above, except that when the branch
>> in %cond.false is replaced with a return, the return in %cond.end
>> folds to "ret i32 1", a constant. Being the same value each time, this
>> is simple enough for accumulator recursion elimination.
>>
>
> Thank you very much for the explanation! It all makes sense now.
> However, the point remains that gcc (compiling down from both C or
> fortran) can indeed optimize this code despite the non-constant
> base case. Do you or someone know what reasoning does it apply?

The short answer is that their accumulator recursion elimination is more 
capable.

GCC keeps two accumulators, one for multiplicative factors and one for 
additive factors. They handle TRE where they produce "add + mult * f()" 
where f() is the tail call of itself. I haven't put thought into it to 
figure out whether that's what we should do or if we can do it better.

> Should I file a bug report?

Sure.

Nick

>> PS. In case you aren't aware, I recently bemoaned the state of our TRE
>> abilities here:
>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140505/216092.html
>
> Very interesting!
>
>> Nick
>
> Thanks,
> Nicola
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev