[LLVMdev] Small problem with the tail call elimination pass

Thu May 8 16:00:39 PDT 2014

On 8 May 2014 08:57, Nicola Gigante <nicola.gigante at gmail.com> wrote:

> Hello everybody,
>
> On the documentation page for the tailcallelim pass you can read:
>
> "This pass transforms functions that are prevented from being tail
> recursive by an
> associative expression to use an accumulator variable, thus compiling the
> typical
> naive factorial or fib implementation into efficient code”
>
> However, I don’t see this behavior when trying to compile this variant of
> the mentioned “typical naive fib implementation”:
>
> unsigned int fib(unsigned int n) {
>     return n < 2 ? n : fib(n-1) + fib(n-2);
> }
>
> The IR with clang -O3 (version 3.4) is this:
> define i32 @_Z9fibj(i32 %n) #0 {
>   %1 = icmp ult i32 %n, 2
>   br i1 %1, label %8, label %2
>
> ; <label>:2                                       ; preds = %0
>   %3 = add i32 %n, -1
>   %4 = tail call i32 @_Z9fibj(i32 %3)
>   %5 = add i32 %n, -2
>   %6 = tail call i32 @_Z9fibj(i32 %5)
>   %7 = add i32 %6, %4
>   ret i32 %7
>
> ; <label>:8                                       ; preds = %0
>   ret i32 %n
> }
>
> You see that the tail call is not eliminated. However, it does get
> eliminated
> if I change to this code:
>
> unsigned int fib(unsigned int n) {
>     return n <= 2 ? 1 : fib(n-1) + fib(n-2);
> }
>
> IR:
> define i32 @_Z9fibj(i32 %n) #0 {
>   %1 = icmp ult i32 %n, 3
>   br i1 %1, label %tailrecurse._crit_edge, label %tailrecurse
>
> tailrecurse:                                      ; preds = %0,
> %tailrecurse
>   %n.tr2 = phi i32 [ %4, %tailrecurse ], [ %n, %0 ]
>   %accumulator.tr1 = phi i32 [ %5, %tailrecurse ], [ 1, %0 ]
>   %2 = add i32 %n.tr2, -1
>   %3 = tail call i32 @_Z9fibj(i32 %2)
>   %4 = add i32 %n.tr2, -2
>   %5 = add i32 %3, %accumulator.tr1
>   %6 = icmp ult i32 %4, 3
>   br i1 %6, label %tailrecurse._crit_edge, label %tailrecurse
>
> tailrecurse._crit_edge:                           ; preds = %tailrecurse,
> %0
>   %accumulator.tr.lcssa = phi i32 [ 1, %0 ], [ %5, %tailrecurse ]
>   ret i32 %accumulator.tr.lcssa
> }
>
> Now, note that gcc 4.8 compiles well both the examples, and the
> resulting loop gets also unrolled a dozen times.
>
> I’m compiling with -O3 with both compilers.
>
> So what is stopping the tail call elimination pass to doing the same
> thing?
>

At the moment TRE runs, the function reads:

; Function Attrs: nounwind readnone
define i32 @fib(i32 %n) #0 {
entry:
  %cmp = icmp ult i32 %n, 2
  br i1 %cmp, label %cond.end, label %cond.false

cond.false:                                       ; preds = %entry
  %sub = add i32 %n, -1
  %call = tail call i32 @fib(i32 %sub)
  %sub1 = add i32 %n, -2
  %call2 = tail call i32 @fib(i32 %sub1)
  %add = add i32 %call, %call2
  br label %cond.end

cond.end:                                         ; preds = %entry,
%cond.false
  %cond = phi i32 [ %add, %cond.false ], [ %n, %entry ]
  ret i32 %cond
}

TRE starts at the 'ret' and looks upwards and checks all of that block's
predecessors to find any that end in unconditional branches. Then it walks
upwards in such blocks to find if there's a "TRE candidate" which is a call
to itself (but aborts if there are instructions is can't handle between the
call and the end of the block), and manages to find %call2 -- and calls
EliminateRecursiveTailCall on it. Good job! At this stage, it transforms
the "br label %cond.end" into a "ret i32 %add" and updates the phi in
%cond.end. There are now *two* returns, the one in %cond.end is "ret i32
%n".

EliminateRecursiveTailCall detects that TRE'ing this requires performing
accumulator recursion elimination. To do this, it checks that "the function
containing the specified tail call consistently returns the same
runtime-constant value" (at all exit points except for our ret at the end
of %cond.false). Now that we have a ret i32 %n, this is false; the safety
check finds that the argument for %n can change because we pass in %sub and
%sub1 into ourselves instead of always passing %n to ourselves. That's what
stops the transformation.

Your working example makes two modifications, you change "n < 2" to "n <=
2" and you change "? n" to "? 1". It's the second of these that will make
the difference, because it changes %cond to:

  %cond = phi i32 [ %add, %cond.false ], [ 1, %entry ]

Everything proceeds the same way as above, except that when the branch in
%cond.false is replaced with a return, the return in %cond.end folds to
"ret i32 1", a constant. Being the same value each time, this is simple
enough for accumulator recursion elimination.

Nick

PS. In case you aren't aware, I recently bemoaned the state of our TRE
abilities here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140505/216092.html

Note that I don’t write this email because I need a performing
> fibonacci function… What I’m trying to do is to track down the
> reason why fortran code is performing better than C on the
> source code used as benchmark in the main page of the
> Julia language in [1]. I wasn’t expecting this huge difference
> between fortran and C code on an example as simple as this, so I
> tried to investigate, even if I don’t know anything about fortran.
> Their source code use int instead of unsigned as in my example.
> I don’t know quite why, but this slows down everything both in gcc
> and clang, because of something related to how the two different
> languages handle overflow in integer arithmetic, I think. When I use
> unsigned int, gcc performance matches gfortran’s, and the generated
> code is nearly identical, while clang stays way behind because of the
> missing tail call elimination.
>
> Note that even with the modified example where the tail call does
> get eliminated, a simple benchmark shows that clang’s code
> is still 2-3x times slower than gcc's, I suppose because of the missing
> unrolling.
>
> Note that in that benchmark, Julia seems winning too, and they use
> LLVM as the backend so something’s missing...
>
> Can you help me understand what’s happening?
>
> Thank you,
> Nicola
>
> [1] http://julialang.org
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140508/990f8606/attachment.html>