[llvm-commits] Patch to improve loop-simplify

Sat Nov 3 10:13:40 PDT 2012

It's not clear to me how LICM would provide any benefit in hoisting
redundancies out of conditional control flow - isn't the point of LICM
to extract expressions from a loop so they can be executed once
outside the loop?  If the contents of the loop are already expressed
with conditionals, there wouldn't be any benefit to this optimization.

If there was an invariant in the original loop that could be extracted
by LICM, I can't see that splitting into an outer/inner loop would
make this any easier to find.

Do you have a test case that I could use to verify?

Andrew

On Fri, Nov 2, 2012 at 4:53 PM, Dan Gohman <dan433584 at gmail.com> wrote:
> It's possible that I've missed something, but it's still not clear to me
> that this patch is desirable. You proposed that conditionals are better than
> loops, but that's not obviously true. As I mentioned before, consider
> something like LICM; if the compiler converts multiple backedges into
> multiple loops, there may be opportunities for hoisting invariants from the
> inner loop. If the compiler leaves it as a conditional inside of a loop,
> there's no easy way to eliminate similar redundancies. This is a case where
> loops are demonstrably easier to analyze.
>
> Dan
>
> On Wed, Oct 31, 2012 at 3:42 PM, Andrew Clinton <ajclinto at gmail.com> wrote:
>>
>> Since I don't seem to be getting any responses, I wonder if some
>> developers could chime in and let me know if there is something else I
>> need to do do have my patch accepted?
>>
>> I believe I have followed the submission guidelines - used the correct
>> coding style, added regression tests, and explained a few times the
>> benefit of the patch, all of which have taken a few days of work.  It
>> would be nice if this work could be rewarded with the patch being
>> added to the software :)
>>
>> Anyone?
>>
>> On Tue, Oct 30, 2012 at 9:56 PM, Andrew Clinton <ajclinto at gmail.com>
>> wrote:
>> > I've updated the patch to work against the latest tree (see attached).
>> >  There was also a related discussion on the llvm-dev list "SimplifyCFG
>> > vs loops" where I've put some background on what this is needed for.
>> >
>> > On Fri, Jul 15, 2011 at 3:02 PM, Andrew Clinton <ajclinto at gmail.com>
>> > wrote:
>> >> Do you think this patch could be included in the current head?
>> >>
>> >> Andrew
>> >>
>> >> On Sun, Mar 13, 2011 at 8:50 PM, Andrew Clinton <ajclinto at gmail.com>
>> >> wrote:
>> >>> The updated patch (attached) eliminates the extra jmp in the generated
>> >>> machine code.  It does this by inserting the backedge block directly
>> >>> before the loop header rather than after the last backedge block.
>> >>> This will result in basically the same block placement as if an outer
>> >>> loop were extracted, but will not introduce a new loop.
>> >>>
>> >>> Andrew
>> >>>
>> >>> On Wed, Mar 9, 2011 at 9:33 AM, Andrew Clinton <ajclinto at gmail.com>
>> >>> wrote:
>> >>>> Here's a test case to show the unexpected introduction of nested
>> >>>> loops:
>> >>>>
>> >>>> bool test_cond();
>> >>>> int test()
>> >>>> {
>> >>>>  int val = 0;
>> >>>>  while (test_cond())
>> >>>>  {
>> >>>>    if (test_cond())
>> >>>>    {
>> >>>>      val += test_cond();
>> >>>>    }
>> >>>>  }
>> >>>>  return val;
>> >>>> }
>> >>>>
>> >>>> 1. compile with clang: clang -S -emit-llvm test.C
>> >>>> 2. opt -O3 -S test.s
>> >>>>
>> >>>> The optimized assembly has two loops, where the input program just
>> >>>> had
>> >>>> a single loop with a conditional inside it.
>> >>>>
>> >>>> On Tue, Mar 8, 2011 at 11:16 AM, Andrew Clinton <ajclinto at gmail.com>
>> >>>> wrote:
>> >>>>> The goal wasn't to improve the resulting code (although I didn't
>> >>>>> want
>> >>>>> it to degrade it either).  The purpose was to improve analysis, so
>> >>>>> that when there is plain ordinary control flow inside a loop (that
>> >>>>> happens to be represented as multiple backedges), it's not converted
>> >>>>> to nested loops.  This conversion makes it very awkward to analyse
>> >>>>> the
>> >>>>> looping structure in my application.  I've worked around this by
>> >>>>> writing my own "loop-merging" pass to undo the nesting
>> >>>>> transformation
>> >>>>> for loops that have no exit (that is not contained in another loop).
>> >>>>> Perhaps, if this codegen problem with this patch can't be
>> >>>>> surmounted,
>> >>>>> I could submit the "loop-merging" pass in a separate patch.
>> >>>>>
>> >>>>> Looking at the IR, I see the following:
>> >>>>>
>> >>>>> With patch:
>> >>>>> - "Inner loop" has 2 conditional branches
>> >>>>> - "Outer loop" has 2 conditional branches and 1 unconditional branch
>> >>>>>
>> >>>>> Without patch:
>> >>>>> - Inner loop has 2 conditional branches
>> >>>>> - Outer loop has 2 conditional branches and 2 unconditional branches
>> >>>>>
>> >>>>> How is the IR more complex with the patch?
>> >>>>>
>> >>>>> On Tue, Mar 8, 2011 at 4:52 AM, Cameron Zwarich <zwarich at apple.com>
>> >>>>> wrote:
>> >>>>>> Upon closer inspection, the difference in codegen is reflected in
>> >>>>>> the input that is fed to llc. With your patch both the inner and outer loops
>> >>>>>> have a split backedge. Without your patch only the outer loop has a split
>> >>>>>> backedge. It's not a coincidence that the block placement is better with the
>> >>>>>> version that doesn't split the inner loop's backedge, and as far as I can
>> >>>>>> tell there is no pass that is generally capable of repairing such a thing.
>> >>>>>>
>> >>>>>> What benefit does your patch bring? Is there some case where it
>> >>>>>> improves the resulting code?
>> >>>>>>
>> >>>>>> Cameron
>> >>>>>>
>> >>>>>> On Mar 7, 2011, at 6:28 PM, Andrew Clinton wrote:
>> >>>>>>
>> >>>>>>> I've run a few more tests based on the IR listed below.  It
>> >>>>>>> appears
>> >>>>>>> that regardless of whether my patch is present, llc will generate
>> >>>>>>> code
>> >>>>>>> without the extra "jmp" for the old .ll and with the extra "jmp"
>> >>>>>>> with
>> >>>>>>> the new .ll.  So it appears that the patch is not responsible for
>> >>>>>>> this
>> >>>>>>> difference in the generated x86 assembly code.  I'm guessing that
>> >>>>>>> the
>> >>>>>>> blocks happen to fall in a nicer pattern for codegen in this
>> >>>>>>> particular case, when an outer loop is split out.  Could you
>> >>>>>>> commit
>> >>>>>>> the patch?
>> >>>>>>>
>> >>>>>>> Andrew
>> >>>>>>>
>> >>>>>>> On Thu, Mar 3, 2011 at 5:59 PM, Cameron Zwarich
>> >>>>>>> <zwarich at apple.com> wrote:
>> >>>>>>>> That is a bit strange. There is another run of LoopSimplify in
>> >>>>>>>> the backend for LoopStrengthReduce. Perhaps a problem is reintroduced there
>> >>>>>>>> that is not cleaned up.
>> >>>>>>>>
>> >>>>>>>> Cameron
>> >>>>>>>>
>> >>>>>>>> On Mar 3, 2011, at 8:10 AM, Andrew Clinton wrote:
>> >>>>>>>>
>> >>>>>>>>> I'm not sure why the code generator is synthesizing an
>> >>>>>>>>> additional
>> >>>>>>>>> unconditional branch.  The IR is certainly simpler with this
>> >>>>>>>>> patch.
>> >>>>>>>>>
>> >>>>>>>>> --------------------------------
>> >>>>>>>>> With patch:
>> >>>>>>>>>> opt -O3 not-nested-loop.ll -S
>> >>>>>>>>>
>> >>>>>>>>> ; ModuleID = 'not-nested-loop.ll'
>> >>>>>>>>>
>> >>>>>>>>> define i32 @_Z4testv() {
>> >>>>>>>>> ; <label>:0
>> >>>>>>>>>  %1 = tail call zeroext i1 @_Z9test_condv()
>> >>>>>>>>>  br i1 %1, label %.lr.ph, label %._crit_edge
>> >>>>>>>>>
>> >>>>>>>>> .lr.ph:                                           ; preds = %0,
>> >>>>>>>>> %.backedge
>> >>>>>>>>>  %val.02 = phi i32 [ %val.0.be, %.backedge ], [ 0, %0 ]
>> >>>>>>>>>  %i.01 = phi i32 [ %tmp, %.backedge ], [ 0, %0 ]
>> >>>>>>>>>  %tmp = add i32 %i.01, 1
>> >>>>>>>>>  %2 = mul nsw i32 %tmp, %tmp
>> >>>>>>>>>  %3 = icmp sgt i32 %2, 6
>> >>>>>>>>>  br i1 %3, label %5, label %.backedge
>> >>>>>>>>>
>> >>>>>>>>> .backedge:                                        ; preds =
>> >>>>>>>>> %.lr.ph, %5
>> >>>>>>>>>  %val.0.be = phi i32 [ %8, %5 ], [ %val.02, %.lr.ph ]
>> >>>>>>>>>  %4 = tail call zeroext i1 @_Z9test_condv()
>> >>>>>>>>>  br i1 %4, label %.lr.ph, label %._crit_edge
>> >>>>>>>>>
>> >>>>>>>>> ; <label>:5                                       ; preds =
>> >>>>>>>>> %.lr.ph
>> >>>>>>>>>  %6 = tail call zeroext i1 @_Z9test_condv()
>> >>>>>>>>>  %7 = zext i1 %6 to i32
>> >>>>>>>>>  %8 = add nsw i32 %7, %val.02
>> >>>>>>>>>  br label %.backedge
>> >>>>>>>>>
>> >>>>>>>>> ._crit_edge:                                      ; preds =
>> >>>>>>>>> %.backedge, %0
>> >>>>>>>>>  %val.0.lcssa = phi i32 [ 0, %0 ], [ %val.0.be, %.backedge ]
>> >>>>>>>>>  ret i32 %val.0.lcssa
>> >>>>>>>>> }
>> >>>>>>>>>
>> >>>>>>>>> declare zeroext i1 @_Z9test_condv()
>> >>>>>>>>>
>> >>>>>>>>> --------------------------------
>> >>>>>>>>> Without patch:
>> >>>>>>>>>
>> >>>>>>>>>> opt -O3 not-nested-loop.ll -S
>> >>>>>>>>>
>> >>>>>>>>> ; ModuleID = 'not-nested-loop.ll'
>> >>>>>>>>>
>> >>>>>>>>> define i32 @_Z4testv() {
>> >>>>>>>>> ; <label>:0
>> >>>>>>>>>  br label %.outer
>> >>>>>>>>>
>> >>>>>>>>> .outer:                                           ; preds = %6,
>> >>>>>>>>> %0
>> >>>>>>>>>  %i.0.ph = phi i32 [ 1, %0 ], [ %phitmp, %6 ]
>> >>>>>>>>>  %val.0.ph = phi i32 [ 0, %0 ], [ %9, %6 ]
>> >>>>>>>>>  br label %1
>> >>>>>>>>>
>> >>>>>>>>> ; <label>:1                                       ; preds =
>> >>>>>>>>> %.outer, %3
>> >>>>>>>>>  %indvar = phi i32 [ 0, %.outer ], [ %indvar.next, %3 ]
>> >>>>>>>>>  %tmp1 = add i32 %i.0.ph, %indvar
>> >>>>>>>>>  %2 = tail call zeroext i1 @_Z9test_condv()
>> >>>>>>>>>  br i1 %2, label %3, label %10
>> >>>>>>>>>
>> >>>>>>>>> ; <label>:3                                       ; preds = %1
>> >>>>>>>>>  %4 = mul nsw i32 %tmp1, %tmp1
>> >>>>>>>>>  %5 = icmp sgt i32 %4, 6
>> >>>>>>>>>  %indvar.next = add i32 %indvar, 1
>> >>>>>>>>>  br i1 %5, label %6, label %1
>> >>>>>>>>>
>> >>>>>>>>> ; <label>:6                                       ; preds = %3
>> >>>>>>>>>  %7 = tail call zeroext i1 @_Z9test_condv()
>> >>>>>>>>>  %8 = zext i1 %7 to i32
>> >>>>>>>>>  %9 = add nsw i32 %8, %val.0.ph
>> >>>>>>>>>  %phitmp = add i32 %tmp1, 1
>> >>>>>>>>>  br label %.outer
>> >>>>>>>>>
>> >>>>>>>>> ; <label>:10                                      ; preds = %1
>> >>>>>>>>>  ret i32 %val.0.ph
>> >>>>>>>>> }
>> >>>>>>>>>
>> >>>>>>>>> declare zeroext i1 @_Z9test_condv()
>> >>>>>>>>>
>> >>>>>>>>> On Thu, Mar 3, 2011 at 3:43 AM, Cameron Zwarich
>> >>>>>>>>> <zwarich at apple.com> wrote:
>> >>>>>>>>>> When I compile your first test case without this patch for X86,
>> >>>>>>>>>> I see 2 conditional branches in the loop body and no unconditional branches.
>> >>>>>>>>>> With this patch, there is an extra unconditional branch inside the loop.
>> >>>>>>>>>> Unfortunately, I don't think this is cleaned up as well as you assume.
>> >>>>>>>>>>
>> >>>>>>>>>> Cameron
>> >>>>>>>>>>
>> >>>>>>>>>> On 2011-03-02, at 11:56 PM, Andrew Clinton wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> The role of the loop-simplify pass, as described in the
>> >>>>>>>>>>> header, is to
>> >>>>>>>>>>> transform natural loops into a simpler form.  The goal of this
>> >>>>>>>>>>> patch
>> >>>>>>>>>>> is not to produce the minimum number of instructions but to
>> >>>>>>>>>>> ensure
>> >>>>>>>>>>> that additional (unnecessary) complexity is not introduced by
>> >>>>>>>>>>> the
>> >>>>>>>>>>> simplification.  I think this is a good change since running
>> >>>>>>>>>>> the
>> >>>>>>>>>>> natural loops analysis after this pass will then produce the
>> >>>>>>>>>>> expected
>> >>>>>>>>>>> loop nesting.  Without this patch, I have found it necessary
>> >>>>>>>>>>> to write
>> >>>>>>>>>>> a new pass that merges the nested loops.
>> >>>>>>>>>>>
>> >>>>>>>>>>> I believe that other passes will clean up extra branching (CFG
>> >>>>>>>>>>> simplification), and PHI nodes will usually add runtime
>> >>>>>>>>>>> complexity
>> >>>>>>>>>>> only for the incoming blocks.  So I wouldn't expect runtime
>> >>>>>>>>>>> performance to be impacted by this change.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Attached is the revised diff - thank you for the feedback!
>> >>>>>>>>>>>
>> >>>>>>>>>>> Andrew
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Thu, Mar 3, 2011 at 2:10 AM, Cameron Zwarich
>> >>>>>>>>>>> <zwarich at apple.com> wrote:
>> >>>>>>>>>>>> Is splitting an edge to create a unique loop backedge better
>> >>>>>>>>>>>> than creating two loops? If the inner loop's backedge is more frequently
>> >>>>>>>>>>>> taken, then you are trading an extra unconditional branch (with phis) on
>> >>>>>>>>>>>> loop entry for an extra unconditional branch (with phis) along the inner
>> >>>>>>>>>>>> loop's backedge. When is this a good trade?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Cameron
>> >>>>>>>>>>> <nestedloops.diff>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>