[LLVMdev] Small problem with the tail call elimination pass

Nicola Gigante nicola.gigante at gmail.com
Fri May 9 01:21:38 PDT 2014


Il giorno 09/mag/2014, alle ore 01:00, Nick Lewycky <nlewycky at google.com> ha scritto:

> At the moment TRE runs, the function reads:
> 
> ; Function Attrs: nounwind readnone
> define i32 @fib(i32 %n) #0 {
> entry:
>   %cmp = icmp ult i32 %n, 2
>   br i1 %cmp, label %cond.end, label %cond.false
> 
> cond.false:                                       ; preds = %entry
>   %sub = add i32 %n, -1
>   %call = tail call i32 @fib(i32 %sub)
>   %sub1 = add i32 %n, -2
>   %call2 = tail call i32 @fib(i32 %sub1)
>   %add = add i32 %call, %call2
>   br label %cond.end
> 
> cond.end:                                         ; preds = %entry, %cond.false
>   %cond = phi i32 [ %add, %cond.false ], [ %n, %entry ]
>   ret i32 %cond
> }
> 
> TRE starts at the 'ret' and looks upwards and checks all of that block's predecessors to find any that end in unconditional branches. Then it walks upwards in such blocks to find if there's a "TRE candidate" which is a call to itself (but aborts if there are instructions is can't handle between the call and the end of the block), and manages to find %call2 -- and calls EliminateRecursiveTailCall on it. Good job! At this stage, it transforms the "br label %cond.end" into a "ret i32 %add" and updates the phi in %cond.end. There are now *two* returns, the one in %cond.end is "ret i32 %n".
> 
> EliminateRecursiveTailCall detects that TRE'ing this requires performing accumulator recursion elimination. To do this, it checks that "the function containing the specified tail call consistently returns the same runtime-constant value" (at all exit points except for our ret at the end of %cond.false). Now that we have a ret i32 %n, this is false; the safety check finds that the argument for %n can change because we pass in %sub and %sub1 into ourselves instead of always passing %n to ourselves. That's what stops the transformation.
> 
> Your working example makes two modifications, you change "n < 2" to "n <= 2" and you change "? n" to "? 1". It's the second of these that will make the difference, because it changes %cond to:
> 
>   %cond = phi i32 [ %add, %cond.false ], [ 1, %entry ]
> 
> Everything proceeds the same way as above, except that when the branch in %cond.false is replaced with a return, the return in %cond.end folds to "ret i32 1", a constant. Being the same value each time, this is simple enough for accumulator recursion elimination.
> 

Thank you very much for the explanation! It all makes sense now.
However, the point remains that gcc (compiling down from both C or fortran) can indeed optimize this code despite the non-constant
base case. Do you or someone know what reasoning does it apply?
Should I file a bug report?

> 
> PS. In case you aren't aware, I recently bemoaned the state of our TRE abilities here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140505/216092.html


Very interesting!

> Nick

Thanks,
Nicola

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140509/e56ec15d/attachment.html>


More information about the llvm-dev mailing list