[llvm-commits] Patch to improve loop-simplify

Tue Oct 30 18:56:35 PDT 2012

I've updated the patch to work against the latest tree (see attached).
 There was also a related discussion on the llvm-dev list "SimplifyCFG
vs loops" where I've put some background on what this is needed for.

On Fri, Jul 15, 2011 at 3:02 PM, Andrew Clinton <ajclinto at gmail.com> wrote:
> Do you think this patch could be included in the current head?
>
> Andrew
>
> On Sun, Mar 13, 2011 at 8:50 PM, Andrew Clinton <ajclinto at gmail.com> wrote:
>> The updated patch (attached) eliminates the extra jmp in the generated
>> machine code.  It does this by inserting the backedge block directly
>> before the loop header rather than after the last backedge block.
>> This will result in basically the same block placement as if an outer
>> loop were extracted, but will not introduce a new loop.
>>
>> Andrew
>>
>> On Wed, Mar 9, 2011 at 9:33 AM, Andrew Clinton <ajclinto at gmail.com> wrote:
>>> Here's a test case to show the unexpected introduction of nested loops:
>>>
>>> bool test_cond();
>>> int test()
>>> {
>>>  int val = 0;
>>>  while (test_cond())
>>>  {
>>>    if (test_cond())
>>>    {
>>>      val += test_cond();
>>>    }
>>>  }
>>>  return val;
>>> }
>>>
>>> 1. compile with clang: clang -S -emit-llvm test.C
>>> 2. opt -O3 -S test.s
>>>
>>> The optimized assembly has two loops, where the input program just had
>>> a single loop with a conditional inside it.
>>>
>>> On Tue, Mar 8, 2011 at 11:16 AM, Andrew Clinton <ajclinto at gmail.com> wrote:
>>>> The goal wasn't to improve the resulting code (although I didn't want
>>>> it to degrade it either).  The purpose was to improve analysis, so
>>>> that when there is plain ordinary control flow inside a loop (that
>>>> happens to be represented as multiple backedges), it's not converted
>>>> to nested loops.  This conversion makes it very awkward to analyse the
>>>> looping structure in my application.  I've worked around this by
>>>> writing my own "loop-merging" pass to undo the nesting transformation
>>>> for loops that have no exit (that is not contained in another loop).
>>>> Perhaps, if this codegen problem with this patch can't be surmounted,
>>>> I could submit the "loop-merging" pass in a separate patch.
>>>>
>>>> Looking at the IR, I see the following:
>>>>
>>>> With patch:
>>>> - "Inner loop" has 2 conditional branches
>>>> - "Outer loop" has 2 conditional branches and 1 unconditional branch
>>>>
>>>> Without patch:
>>>> - Inner loop has 2 conditional branches
>>>> - Outer loop has 2 conditional branches and 2 unconditional branches
>>>>
>>>> How is the IR more complex with the patch?
>>>>
>>>> On Tue, Mar 8, 2011 at 4:52 AM, Cameron Zwarich <zwarich at apple.com> wrote:
>>>>> Upon closer inspection, the difference in codegen is reflected in the input that is fed to llc. With your patch both the inner and outer loops have a split backedge. Without your patch only the outer loop has a split backedge. It's not a coincidence that the block placement is better with the version that doesn't split the inner loop's backedge, and as far as I can tell there is no pass that is generally capable of repairing such a thing.
>>>>>
>>>>> What benefit does your patch bring? Is there some case where it improves the resulting code?
>>>>>
>>>>> Cameron
>>>>>
>>>>> On Mar 7, 2011, at 6:28 PM, Andrew Clinton wrote:
>>>>>
>>>>>> I've run a few more tests based on the IR listed below.  It appears
>>>>>> that regardless of whether my patch is present, llc will generate code
>>>>>> without the extra "jmp" for the old .ll and with the extra "jmp" with
>>>>>> the new .ll.  So it appears that the patch is not responsible for this
>>>>>> difference in the generated x86 assembly code.  I'm guessing that the
>>>>>> blocks happen to fall in a nicer pattern for codegen in this
>>>>>> particular case, when an outer loop is split out.  Could you commit
>>>>>> the patch?
>>>>>>
>>>>>> Andrew
>>>>>>
>>>>>> On Thu, Mar 3, 2011 at 5:59 PM, Cameron Zwarich <zwarich at apple.com> wrote:
>>>>>>> That is a bit strange. There is another run of LoopSimplify in the backend for LoopStrengthReduce. Perhaps a problem is reintroduced there that is not cleaned up.
>>>>>>>
>>>>>>> Cameron
>>>>>>>
>>>>>>> On Mar 3, 2011, at 8:10 AM, Andrew Clinton wrote:
>>>>>>>
>>>>>>>> I'm not sure why the code generator is synthesizing an additional
>>>>>>>> unconditional branch.  The IR is certainly simpler with this patch.
>>>>>>>>
>>>>>>>> --------------------------------
>>>>>>>> With patch:
>>>>>>>>> opt -O3 not-nested-loop.ll -S
>>>>>>>>
>>>>>>>> ; ModuleID = 'not-nested-loop.ll'
>>>>>>>>
>>>>>>>> define i32 @_Z4testv() {
>>>>>>>> ; <label>:0
>>>>>>>>  %1 = tail call zeroext i1 @_Z9test_condv()
>>>>>>>>  br i1 %1, label %.lr.ph, label %._crit_edge
>>>>>>>>
>>>>>>>> .lr.ph:                                           ; preds = %0, %.backedge
>>>>>>>>  %val.02 = phi i32 [ %val.0.be, %.backedge ], [ 0, %0 ]
>>>>>>>>  %i.01 = phi i32 [ %tmp, %.backedge ], [ 0, %0 ]
>>>>>>>>  %tmp = add i32 %i.01, 1
>>>>>>>>  %2 = mul nsw i32 %tmp, %tmp
>>>>>>>>  %3 = icmp sgt i32 %2, 6
>>>>>>>>  br i1 %3, label %5, label %.backedge
>>>>>>>>
>>>>>>>> .backedge:                                        ; preds = %.lr.ph, %5
>>>>>>>>  %val.0.be = phi i32 [ %8, %5 ], [ %val.02, %.lr.ph ]
>>>>>>>>  %4 = tail call zeroext i1 @_Z9test_condv()
>>>>>>>>  br i1 %4, label %.lr.ph, label %._crit_edge
>>>>>>>>
>>>>>>>> ; <label>:5                                       ; preds = %.lr.ph
>>>>>>>>  %6 = tail call zeroext i1 @_Z9test_condv()
>>>>>>>>  %7 = zext i1 %6 to i32
>>>>>>>>  %8 = add nsw i32 %7, %val.02
>>>>>>>>  br label %.backedge
>>>>>>>>
>>>>>>>> ._crit_edge:                                      ; preds = %.backedge, %0
>>>>>>>>  %val.0.lcssa = phi i32 [ 0, %0 ], [ %val.0.be, %.backedge ]
>>>>>>>>  ret i32 %val.0.lcssa
>>>>>>>> }
>>>>>>>>
>>>>>>>> declare zeroext i1 @_Z9test_condv()
>>>>>>>>
>>>>>>>> --------------------------------
>>>>>>>> Without patch:
>>>>>>>>
>>>>>>>>> opt -O3 not-nested-loop.ll -S
>>>>>>>>
>>>>>>>> ; ModuleID = 'not-nested-loop.ll'
>>>>>>>>
>>>>>>>> define i32 @_Z4testv() {
>>>>>>>> ; <label>:0
>>>>>>>>  br label %.outer
>>>>>>>>
>>>>>>>> .outer:                                           ; preds = %6, %0
>>>>>>>>  %i.0.ph = phi i32 [ 1, %0 ], [ %phitmp, %6 ]
>>>>>>>>  %val.0.ph = phi i32 [ 0, %0 ], [ %9, %6 ]
>>>>>>>>  br label %1
>>>>>>>>
>>>>>>>> ; <label>:1                                       ; preds = %.outer, %3
>>>>>>>>  %indvar = phi i32 [ 0, %.outer ], [ %indvar.next, %3 ]
>>>>>>>>  %tmp1 = add i32 %i.0.ph, %indvar
>>>>>>>>  %2 = tail call zeroext i1 @_Z9test_condv()
>>>>>>>>  br i1 %2, label %3, label %10
>>>>>>>>
>>>>>>>> ; <label>:3                                       ; preds = %1
>>>>>>>>  %4 = mul nsw i32 %tmp1, %tmp1
>>>>>>>>  %5 = icmp sgt i32 %4, 6
>>>>>>>>  %indvar.next = add i32 %indvar, 1
>>>>>>>>  br i1 %5, label %6, label %1
>>>>>>>>
>>>>>>>> ; <label>:6                                       ; preds = %3
>>>>>>>>  %7 = tail call zeroext i1 @_Z9test_condv()
>>>>>>>>  %8 = zext i1 %7 to i32
>>>>>>>>  %9 = add nsw i32 %8, %val.0.ph
>>>>>>>>  %phitmp = add i32 %tmp1, 1
>>>>>>>>  br label %.outer
>>>>>>>>
>>>>>>>> ; <label>:10                                      ; preds = %1
>>>>>>>>  ret i32 %val.0.ph
>>>>>>>> }
>>>>>>>>
>>>>>>>> declare zeroext i1 @_Z9test_condv()
>>>>>>>>
>>>>>>>> On Thu, Mar 3, 2011 at 3:43 AM, Cameron Zwarich <zwarich at apple.com> wrote:
>>>>>>>>> When I compile your first test case without this patch for X86, I see 2 conditional branches in the loop body and no unconditional branches. With this patch, there is an extra unconditional branch inside the loop. Unfortunately, I don't think this is cleaned up as well as you assume.
>>>>>>>>>
>>>>>>>>> Cameron
>>>>>>>>>
>>>>>>>>> On 2011-03-02, at 11:56 PM, Andrew Clinton wrote:
>>>>>>>>>
>>>>>>>>>> The role of the loop-simplify pass, as described in the header, is to
>>>>>>>>>> transform natural loops into a simpler form.  The goal of this patch
>>>>>>>>>> is not to produce the minimum number of instructions but to ensure
>>>>>>>>>> that additional (unnecessary) complexity is not introduced by the
>>>>>>>>>> simplification.  I think this is a good change since running the
>>>>>>>>>> natural loops analysis after this pass will then produce the expected
>>>>>>>>>> loop nesting.  Without this patch, I have found it necessary to write
>>>>>>>>>> a new pass that merges the nested loops.
>>>>>>>>>>
>>>>>>>>>> I believe that other passes will clean up extra branching (CFG
>>>>>>>>>> simplification), and PHI nodes will usually add runtime complexity
>>>>>>>>>> only for the incoming blocks.  So I wouldn't expect runtime
>>>>>>>>>> performance to be impacted by this change.
>>>>>>>>>>
>>>>>>>>>> Attached is the revised diff - thank you for the feedback!
>>>>>>>>>>
>>>>>>>>>> Andrew
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 3, 2011 at 2:10 AM, Cameron Zwarich <zwarich at apple.com> wrote:
>>>>>>>>>>> Is splitting an edge to create a unique loop backedge better than creating two loops? If the inner loop's backedge is more frequently taken, then you are trading an extra unconditional branch (with phis) on loop entry for an extra unconditional branch (with phis) along the inner loop's backedge. When is this a good trade?
>>>>>>>>>>>
>>>>>>>>>>> Cameron
>>>>>>>>>> <nestedloops.diff>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nestedloops-v2.diff
Type: application/octet-stream
Size: 6693 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121030/795ce825/attachment.obj>