[PATCH] D14526: Don't recompute LCSSA after loop-unrolling when possible.

Tue Nov 10 13:12:08 PST 2015

Michael Zolotukhin <mzolotukhin at apple.com> writes:
> Hi Duncan,
>
>> On Nov 10, 2015, at 9:45 AM, Duncan P. N. Exon Smith
>> <dexonsmith at apple.com> wrote:
>> 
>> 
>>> On 2015-Nov-09, at 20:54, Michael Zolotukhin <mzolotukhin at apple.com> wrote:
>>> 
>>> mzolotukhin created this revision.
>>> mzolotukhin added reviewers: chandlerc, hfinkel, dexonsmith,
>>> bogner, joker.eph.
>>> mzolotukhin added a subscriber: llvm-commits.
>>> 
>>> Currently we always recompute LCSSA for outer loops after unrolling an
>>> inner loop. That leads to compile time problem when we have big loop
>>> nests, and we can solve it by avoiding unnecessary work. For instance,
>>> if w eonly do partial unrolling, we don't break LCSSA, so we don't need
>>> to rebuild it. Also, if all exits from the inner loop are inside the
>>> enclosing loop, then complete unrolling won't break LCSSA either.
>>> 
>>> I replaced unconditional LCSSA recomputation with conditional recomputation +
>>> unconditional assert and added several tests, which were failing when I
>>> experimented with it.
>>> 
>>> Soon I plan to follow up with a similar patch for recalculation of dominators
>>> tree.
>>> 
>>> http://reviews.llvm.org/D14526
>>> 
>>> Files:
>>> lib/Transforms/Utils/LoopUnroll.cpp
>>> test/Transforms/LoopUnroll/rebuild_lcssa.ll
>>> 
>>> <D14526.39782.patch>
>> 
>> This makes sense to me, although I don't know the loop algorithms too well
>> so maybe someone else can confirm.  A few nitpicks below.
>> 
>>> Index: lib/Transforms/Utils/LoopUnroll.cpp
>>> ===================================================================
>>> --- lib/Transforms/Utils/LoopUnroll.cpp
>>> +++ lib/Transforms/Utils/LoopUnroll.cpp
>>> @@ -221,6 +221,16 @@
>>> 
>>>   // Are we eliminating the loop control altogether?
>>>   bool CompletelyUnroll = Count == TripCount;
>>> +  SmallVector<BasicBlock *, 4> ExitBlocks;
>>> +  L->getExitBlocks(ExitBlocks);
>>> +  Loop *ParentL = L->getParentLoop();
>>> +  bool AllExitsAreInsideParentLoop = true;
>>> +  for (auto BB : ExitBlocks) {
>> 
>> This would be clearer as `const BasicBlock *BB`, but should at least be
>> `const auto *BB` to make clear to readers that there are no copies going
>> on.
> Makes sense.
>> 
>>> +    if (LI->getLoopFor(BB) != ParentL) {
>>> +      AllExitsAreInsideParentLoop = false;
>>> +      break;
>>> +    }
>>> +  }
>>> 
>>>   // We assume a run-time trip count if the compiler cannot
>>>   // figure out the loop trip count and the unroll-runtime
>>> @@ -554,7 +564,11 @@
>>>         while (OuterL->getParentLoop() != LatchLoop)
>>>           OuterL = OuterL->getParentLoop();
>>> 
>>> -      formLCSSARecursively(*OuterL, *DT, LI, SE);
>>> +      if (CompletelyUnroll && !AllExitsAreInsideParentLoop)
>>> +        formLCSSARecursively(*OuterL, *DT, LI, SE);
>>> +
>>> +      assert(OuterL->isLCSSAForm(*DT) &&
>> 
>> Is this check cheap?  If not, it seems wasteful to do it when we've just
>> called `formLCSSARecursively()`.  Can you put this in an `else`, or do
>> an early continue inside the `if`?
> I thought it was fine to have even expensive checks under assert - isn’t it?

It's "fine", but if it's easy to avoid it we might as well. No sense
making the asserts build slower for no reason.

>> 
>>> +             "Must have loops in LCSSA form to after loop-unroll.");
>> 
>> s/to after/after/
> Thanks!
>> 
>>>     }
>>>   }
>>> 
>>> 
>>