[PATCH] MachineSink: Fix and tweak critical-edge breaking heuristic.

Mon Sep 16 23:12:08 PDT 2013

On Sep 14, 2013, at 2:53 PM, Will Dietz <w at wdtz.org> wrote:

> Managed to reduce the IR, attached to newly filed PR17245
> (http://llvm.org/PR17245 ) for discussion and investigation.

This looks great. Thanks Will.

I can't quite parse this comment.

+      // reason about blocking it.  Other iterations
+      // will look to sink it block and block until
+      // it resides in this MBB at which point
+      // this will evaluate to true.

Regarding the test failures. I think you should determine whether:
(1) MachineSinking is now doing the intended thing
(2) We're not breaking any codegen patterns (downstream passes still do something reasonable).
(3) The test is still testing something.

If all of the above, then just fixup the CHECK lines.

For example, in CodeGen/ARM/2011-04-11-MachineLICMBug.ll, you're now sinking multiple instructions into the loop's exit paths. I think that's intended behavior since we consider multiple instructions to be profitable for splitting. The final code looks better. And the test still checks that LICM doesn't generate extra copies. It looks like the CHECK labels could use some adjustment.

-Andy

> 
> On Tue, Sep 10, 2013 at 8:24 PM, Will Dietz <w at wdtz.org> wrote:
>> Apologies for not better motivating this in my original post,
>> I'd already spent more time on this than I probably should (dang
>> interesting compilers!)
>> and was hoping what I had would be sufficient.
>> 
>> Since no one's responded yet I'll try to better explain the issue
>> and why this is important.  Thoughts/comments welcome :).
>> 
>> Here's a before-and-after stats and timings for the regression I'm observing:
>> 
>> Before: https://pastee.org/3w6fa (28m47s)
>> After: https://pastee.org/6hx3u (3m49s)
>> 
>> The code in question is an instrumented copy of some image
>> manipulation code from ImageMagick.
>> I'd rather not post it as it was generated by an unpublished research
>> project, but would be happy
>> to provide it upon request to anyone interested in looking into this issue.
>> 
>> That said, the BC probably isn't required for understanding why the
>> existing code is problematic:
>> 
>> The timings above show something certainly isn't working properly
>> in terms of scalability (look at the MachineSink timings!)
>> and I walk through the issue in more detail below.
>> Regardless of my patch's details hopefully this will shed
>> some light on the situation and help get this resolved.
>> 
>> In short, the heuristic in isWorthBreakingCriticalEdge() is O(n) in the number
>> of instructions within the function that define registers used by the
>> instruction
>> being queried.  This leads to unscalable behavior when the number of candidate
>> cheap instructions for sinking is particularly high, potentially O(n^2) overall.
>> 
>> Looking deeper, it seems the hasOneNonDBGUse() is expensive when the register
>> has many definitions as it has to walk through all of the definitions
>> before reaching a use.
>> This happens with EFLAGS easily since many instructions touch it--
>> on my problematic IR this results in walking over 300k operands for each
>> call to hasOneNonDBGUse().
>> 
>> From this alone I'd suggest the code be changed to avoid this
>> unscalable behavior, especially since the function is already labelled
>> as using questionable heuristics.  Note that simply avoiding calling
>> hasOneNonDBGUse() on physical registers is an effective solution
>> since virtual registers will never have more than one definition
>> due to their SSA-like nature.  Furthermore, I argue that ignoring both
>> uses and definitions physical registers is acceptable (see below).
>> 
>> While investigating the scalability issue, it appears
>> the loop does not do what it was originally intended to do,
>> judging by the comment above it (reproduced below):
>> 
>> -------------------
>> // MI is cheap, we probably don't want to break the critical edge for it.
>> // However, if this would allow some definitions of its source operands
>> // to be sunk then it's probably worth it.
>> -------------------
>> 
>> Which doesn't seem to be what the loop presently does:
>> 
>> * It considers registers /defined/ by this instruction.  While sinking
>> an instruction that defines a register with a single use might
>> be helpful for other reasons (reducing the live range),
>> this isn't what the comment claims the purpose of the loop is.
>> * Physical register uses aren't interesting because MachineSink
>> refuses to sink instructions that define physical registers.  This means
>> sinking the use of a physical register will not unblock its definition
>> since the definition is incapable of being sunk.
>> * It makes no attempt to determine if the definition instruction
>> is suitable for sinking into the broken critical edge in the first place.
>> I improve this somewhat by only recognizing an instruction as a "blocker"
>> if the defining instruction is in the same MBB.  This is incomplete of course,
>> but helps avoid spuriously breaking critical edges for instructions we know
>> can't be sunk with this one.
>> 
>> Hope this helps clarify and explain the situation.  Honestly the right answer
>> might lie elsewhere (perhaps in a more thorough reworking of the pass),
>> but unfortunately that's more than I have time for presently :(.
>> 
>> Thoughts/comments welcome! :)
>> 
>> ~Will
>> 
>> 
>> On Sat, Sep 7, 2013 at 10:51 PM, Will Dietz <w at wdtz.org> wrote:
>>> See attached, introduction is below and a description from commit
>>> message is repeated at the end of this message for easier processing
>>> and discussion.
>>> 
>>> This fixes a serious compile-time performance issue for me and hopefully
>>> causes LLVM to generate slightly nicer code.  Unfortunately I'm not in
>>> a position
>>> to do appropriate performance or platform testing of this (perhaps
>>> minor) change, so please let me know how I can best proceed or if
>>> that's something
>>> to leave to nightly testing after the change is finished and passes review.
>>> 
>>> Unfortunately, this breaks a few lit tests in ways that aren't
>>> obviously problematic upon inspection, but it's unclear to me how to
>>> properly resolve them.  The tests in question are:
>>> 
>>>    LLVM :: CodeGen/ARM/2011-04-11-MachineLICMBug.ll
>>>    LLVM :: CodeGen/ARM/2011-08-25-ldmia_ret.ll
>>>    LLVM :: CodeGen/ARM/2012-08-30-select.ll
>>>    LLVM :: CodeGen/X86/2012-11-30-handlemove-dbg.ll
>>>    LLVM :: CodeGen/X86/fold-load.ll
>>>    LLVM :: CodeGen/X86/hoist-common.ll
>>> 
>>> Can someone more familiar with these help me work through them?
>>> 
>>> A self-hosted copy of LLVM/Clang with this change passes its tests
>>> (other than the ones above) and seems to build various applications
>>> properly, so
>>> I'm hopeful these just need to be updated to accommodate this change
>>> and aren't signs of regressions or correctness issues.
>>> 
>>> Please let me know if you have any questions/comments/feedback, and
>>> thank you for your time.
>>> 
>>> --------------------------------------------------------
>>> 
>>> Per original comment, the intention of this loop
>>> is to go ahead and break the critical edge
>>> (in order to sink this instruction) if there's
>>> reason to believe doing so might "unblock" the
>>> sinking of additional instructions that define
>>> registers used by this one.  The idea is that if
>>> we have a few instructions to sink "together"
>>> breaking the edge might be worthwhile.
>>> 
>>> This commit makes a few small changes
>>> to help better realize this goal:
>>> 
>>> First, modify the loop to ignore registers
>>> defined by this instruction.  We don't
>>> sink definitions of physical registers,
>>> and sinking an SSA definition isn't
>>> going to unblock an upstream instruction.
>>> 
>>> Second, ignore uses of physical registers.
>>> Instructions that define physical registers are
>>> rejected for sinking, and so moving this one
>>> won't enable moving any defining instructions.
>>> As an added bonus, while virtual register
>>> use-def chains are generally small due
>>> to SSA goodness, iteration over the uses
>>> and definitions (used by hasOneNonDBGUse)
>>> for physical registers like EFLAGS
>>> can be rather expensive in practice.
>>> (This is the original reason for looking at this)
>>> 
>>> Finally, to keep things simple continue
>>> to only consider this trick for registers that
>>> have a single use (via hasOneNonDBGUse),
>>> but to avoid spuriously breaking critical edges
>>> only do so if the definition resides
>>> in the same MBB and therefore this one directly
>>> blocks it from being sunk as well.
>>> If sinking them together is meant to be,
>>> let the iterative nature of this pass
>>> sink the definition into this block first.
>>> 
>>> ~Will
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits