[llvm-commits] Patch to improve loop-simplify
Cameron Zwarich
zwarich at apple.com
Tue Mar 8 01:52:41 PST 2011
Upon closer inspection, the difference in codegen is reflected in the input that is fed to llc. With your patch both the inner and outer loops have a split backedge. Without your patch only the outer loop has a split backedge. It's not a coincidence that the block placement is better with the version that doesn't split the inner loop's backedge, and as far as I can tell there is no pass that is generally capable of repairing such a thing.
What benefit does your patch bring? Is there some case where it improves the resulting code?
Cameron
On Mar 7, 2011, at 6:28 PM, Andrew Clinton wrote:
> I've run a few more tests based on the IR listed below. It appears
> that regardless of whether my patch is present, llc will generate code
> without the extra "jmp" for the old .ll and with the extra "jmp" with
> the new .ll. So it appears that the patch is not responsible for this
> difference in the generated x86 assembly code. I'm guessing that the
> blocks happen to fall in a nicer pattern for codegen in this
> particular case, when an outer loop is split out. Could you commit
> the patch?
>
> Andrew
>
> On Thu, Mar 3, 2011 at 5:59 PM, Cameron Zwarich <zwarich at apple.com> wrote:
>> That is a bit strange. There is another run of LoopSimplify in the backend for LoopStrengthReduce. Perhaps a problem is reintroduced there that is not cleaned up.
>>
>> Cameron
>>
>> On Mar 3, 2011, at 8:10 AM, Andrew Clinton wrote:
>>
>>> I'm not sure why the code generator is synthesizing an additional
>>> unconditional branch. The IR is certainly simpler with this patch.
>>>
>>> --------------------------------
>>> With patch:
>>>> opt -O3 not-nested-loop.ll -S
>>>
>>> ; ModuleID = 'not-nested-loop.ll'
>>>
>>> define i32 @_Z4testv() {
>>> ; <label>:0
>>> %1 = tail call zeroext i1 @_Z9test_condv()
>>> br i1 %1, label %.lr.ph, label %._crit_edge
>>>
>>> .lr.ph: ; preds = %0, %.backedge
>>> %val.02 = phi i32 [ %val.0.be, %.backedge ], [ 0, %0 ]
>>> %i.01 = phi i32 [ %tmp, %.backedge ], [ 0, %0 ]
>>> %tmp = add i32 %i.01, 1
>>> %2 = mul nsw i32 %tmp, %tmp
>>> %3 = icmp sgt i32 %2, 6
>>> br i1 %3, label %5, label %.backedge
>>>
>>> .backedge: ; preds = %.lr.ph, %5
>>> %val.0.be = phi i32 [ %8, %5 ], [ %val.02, %.lr.ph ]
>>> %4 = tail call zeroext i1 @_Z9test_condv()
>>> br i1 %4, label %.lr.ph, label %._crit_edge
>>>
>>> ; <label>:5 ; preds = %.lr.ph
>>> %6 = tail call zeroext i1 @_Z9test_condv()
>>> %7 = zext i1 %6 to i32
>>> %8 = add nsw i32 %7, %val.02
>>> br label %.backedge
>>>
>>> ._crit_edge: ; preds = %.backedge, %0
>>> %val.0.lcssa = phi i32 [ 0, %0 ], [ %val.0.be, %.backedge ]
>>> ret i32 %val.0.lcssa
>>> }
>>>
>>> declare zeroext i1 @_Z9test_condv()
>>>
>>> --------------------------------
>>> Without patch:
>>>
>>>> opt -O3 not-nested-loop.ll -S
>>>
>>> ; ModuleID = 'not-nested-loop.ll'
>>>
>>> define i32 @_Z4testv() {
>>> ; <label>:0
>>> br label %.outer
>>>
>>> .outer: ; preds = %6, %0
>>> %i.0.ph = phi i32 [ 1, %0 ], [ %phitmp, %6 ]
>>> %val.0.ph = phi i32 [ 0, %0 ], [ %9, %6 ]
>>> br label %1
>>>
>>> ; <label>:1 ; preds = %.outer, %3
>>> %indvar = phi i32 [ 0, %.outer ], [ %indvar.next, %3 ]
>>> %tmp1 = add i32 %i.0.ph, %indvar
>>> %2 = tail call zeroext i1 @_Z9test_condv()
>>> br i1 %2, label %3, label %10
>>>
>>> ; <label>:3 ; preds = %1
>>> %4 = mul nsw i32 %tmp1, %tmp1
>>> %5 = icmp sgt i32 %4, 6
>>> %indvar.next = add i32 %indvar, 1
>>> br i1 %5, label %6, label %1
>>>
>>> ; <label>:6 ; preds = %3
>>> %7 = tail call zeroext i1 @_Z9test_condv()
>>> %8 = zext i1 %7 to i32
>>> %9 = add nsw i32 %8, %val.0.ph
>>> %phitmp = add i32 %tmp1, 1
>>> br label %.outer
>>>
>>> ; <label>:10 ; preds = %1
>>> ret i32 %val.0.ph
>>> }
>>>
>>> declare zeroext i1 @_Z9test_condv()
>>>
>>> On Thu, Mar 3, 2011 at 3:43 AM, Cameron Zwarich <zwarich at apple.com> wrote:
>>>> When I compile your first test case without this patch for X86, I see 2 conditional branches in the loop body and no unconditional branches. With this patch, there is an extra unconditional branch inside the loop. Unfortunately, I don't think this is cleaned up as well as you assume.
>>>>
>>>> Cameron
>>>>
>>>> On 2011-03-02, at 11:56 PM, Andrew Clinton wrote:
>>>>
>>>>> The role of the loop-simplify pass, as described in the header, is to
>>>>> transform natural loops into a simpler form. The goal of this patch
>>>>> is not to produce the minimum number of instructions but to ensure
>>>>> that additional (unnecessary) complexity is not introduced by the
>>>>> simplification. I think this is a good change since running the
>>>>> natural loops analysis after this pass will then produce the expected
>>>>> loop nesting. Without this patch, I have found it necessary to write
>>>>> a new pass that merges the nested loops.
>>>>>
>>>>> I believe that other passes will clean up extra branching (CFG
>>>>> simplification), and PHI nodes will usually add runtime complexity
>>>>> only for the incoming blocks. So I wouldn't expect runtime
>>>>> performance to be impacted by this change.
>>>>>
>>>>> Attached is the revised diff - thank you for the feedback!
>>>>>
>>>>> Andrew
>>>>>
>>>>> On Thu, Mar 3, 2011 at 2:10 AM, Cameron Zwarich <zwarich at apple.com> wrote:
>>>>>> Is splitting an edge to create a unique loop backedge better than creating two loops? If the inner loop's backedge is more frequently taken, then you are trading an extra unconditional branch (with phis) on loop entry for an extra unconditional branch (with phis) along the inner loop's backedge. When is this a good trade?
>>>>>>
>>>>>> Cameron
>>>>> <nestedloops.diff>
>>>>
>>>>
>>
>>
More information about the llvm-commits
mailing list