[llvm-commits] Patch to improve loop-simplify

Thu Mar 3 14:59:33 PST 2011

That is a bit strange. There is another run of LoopSimplify in the backend for LoopStrengthReduce. Perhaps a problem is reintroduced there that is not cleaned up.

Cameron

On Mar 3, 2011, at 8:10 AM, Andrew Clinton wrote:

> I'm not sure why the code generator is synthesizing an additional
> unconditional branch.  The IR is certainly simpler with this patch.
> 
> --------------------------------
> With patch:
>> opt -O3 not-nested-loop.ll -S
> 
> ; ModuleID = 'not-nested-loop.ll'
> 
> define i32 @_Z4testv() {
> ; <label>:0
>  %1 = tail call zeroext i1 @_Z9test_condv()
>  br i1 %1, label %.lr.ph, label %._crit_edge
> 
> .lr.ph:                                           ; preds = %0, %.backedge
>  %val.02 = phi i32 [ %val.0.be, %.backedge ], [ 0, %0 ]
>  %i.01 = phi i32 [ %tmp, %.backedge ], [ 0, %0 ]
>  %tmp = add i32 %i.01, 1
>  %2 = mul nsw i32 %tmp, %tmp
>  %3 = icmp sgt i32 %2, 6
>  br i1 %3, label %5, label %.backedge
> 
> .backedge:                                        ; preds = %.lr.ph, %5
>  %val.0.be = phi i32 [ %8, %5 ], [ %val.02, %.lr.ph ]
>  %4 = tail call zeroext i1 @_Z9test_condv()
>  br i1 %4, label %.lr.ph, label %._crit_edge
> 
> ; <label>:5                                       ; preds = %.lr.ph
>  %6 = tail call zeroext i1 @_Z9test_condv()
>  %7 = zext i1 %6 to i32
>  %8 = add nsw i32 %7, %val.02
>  br label %.backedge
> 
> ._crit_edge:                                      ; preds = %.backedge, %0
>  %val.0.lcssa = phi i32 [ 0, %0 ], [ %val.0.be, %.backedge ]
>  ret i32 %val.0.lcssa
> }
> 
> declare zeroext i1 @_Z9test_condv()
> 
> --------------------------------
> Without patch:
> 
>> opt -O3 not-nested-loop.ll -S
> 
> ; ModuleID = 'not-nested-loop.ll'
> 
> define i32 @_Z4testv() {
> ; <label>:0
>  br label %.outer
> 
> .outer:                                           ; preds = %6, %0
>  %i.0.ph = phi i32 [ 1, %0 ], [ %phitmp, %6 ]
>  %val.0.ph = phi i32 [ 0, %0 ], [ %9, %6 ]
>  br label %1
> 
> ; <label>:1                                       ; preds = %.outer, %3
>  %indvar = phi i32 [ 0, %.outer ], [ %indvar.next, %3 ]
>  %tmp1 = add i32 %i.0.ph, %indvar
>  %2 = tail call zeroext i1 @_Z9test_condv()
>  br i1 %2, label %3, label %10
> 
> ; <label>:3                                       ; preds = %1
>  %4 = mul nsw i32 %tmp1, %tmp1
>  %5 = icmp sgt i32 %4, 6
>  %indvar.next = add i32 %indvar, 1
>  br i1 %5, label %6, label %1
> 
> ; <label>:6                                       ; preds = %3
>  %7 = tail call zeroext i1 @_Z9test_condv()
>  %8 = zext i1 %7 to i32
>  %9 = add nsw i32 %8, %val.0.ph
>  %phitmp = add i32 %tmp1, 1
>  br label %.outer
> 
> ; <label>:10                                      ; preds = %1
>  ret i32 %val.0.ph
> }
> 
> declare zeroext i1 @_Z9test_condv()
> 
> On Thu, Mar 3, 2011 at 3:43 AM, Cameron Zwarich <zwarich at apple.com> wrote:
>> When I compile your first test case without this patch for X86, I see 2 conditional branches in the loop body and no unconditional branches. With this patch, there is an extra unconditional branch inside the loop. Unfortunately, I don't think this is cleaned up as well as you assume.
>> 
>> Cameron
>> 
>> On 2011-03-02, at 11:56 PM, Andrew Clinton wrote:
>> 
>>> The role of the loop-simplify pass, as described in the header, is to
>>> transform natural loops into a simpler form.  The goal of this patch
>>> is not to produce the minimum number of instructions but to ensure
>>> that additional (unnecessary) complexity is not introduced by the
>>> simplification.  I think this is a good change since running the
>>> natural loops analysis after this pass will then produce the expected
>>> loop nesting.  Without this patch, I have found it necessary to write
>>> a new pass that merges the nested loops.
>>> 
>>> I believe that other passes will clean up extra branching (CFG
>>> simplification), and PHI nodes will usually add runtime complexity
>>> only for the incoming blocks.  So I wouldn't expect runtime
>>> performance to be impacted by this change.
>>> 
>>> Attached is the revised diff - thank you for the feedback!
>>> 
>>> Andrew
>>> 
>>> On Thu, Mar 3, 2011 at 2:10 AM, Cameron Zwarich <zwarich at apple.com> wrote:
>>>> Is splitting an edge to create a unique loop backedge better than creating two loops? If the inner loop's backedge is more frequently taken, then you are trading an extra unconditional branch (with phis) on loop entry for an extra unconditional branch (with phis) along the inner loop's backedge. When is this a good trade?
>>>> 
>>>> Cameron
>>> <nestedloops.diff>
>> 
>>