[PATCH] D16838: [LoopUnrolling] Try harder to avoid rebuilding LCSSA when possible.
Michael Zolotukhin via llvm-commits
llvm-commits at lists.llvm.org
Wed Feb 3 13:32:29 PST 2016
Hi Justin,
Thanks for taking a look, please find my comments below. I’ll update the patch shortly!
Michael
> On Feb 3, 2016, at 11:56 AM, Justin Bogner <mail at justinbogner.com> wrote:
>
> Michael Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> writes:
>> mzolotukhin created this revision.
>> mzolotukhin added reviewers: chandlerc, hfinkel, dexonsmith, bogner, joker.eph.
>> mzolotukhin added a subscriber: llvm-commits.
>>
>> In r255133 (reapplied r253126) we started to avoid redundant
>> recomputation of LCSSA after loop-unrolling. This patch moves one step
>> further in this direction - now we can avoid it for much wider range
>> of loops, as we start to look at IR and try to figure out if the
>> transformation actually breaks LCSSA phis or makes it necessary to
>> insert new ones.
>>
>> In future we might go even further and try to fix LCSSA in-place
>> rather than rebuilding it, but I'm not quite sure yet that it's always
>> possible (and computationally cheaper). Anyway, this patch seems to be
>> aligned with that direction.
>>
>> One of the most important use-cases that previous implementation
>> didn't handle is loops with calls that might throw an exception. Such
>> loops have exits from entire loop nest, but we still don't need to
>> recompute LCSSA after unrolling, as such exits usually don't contain
>> LCSSA phis.
>>
>> The patch was tested on LNT testsuite + SPECS, neither failures nor
>> significant compile time changes were spotted.
>>
>> http://reviews.llvm.org/D16838
>>
>> Files:
>> lib/Transforms/Utils/LoopUnroll.cpp
>>
>> Index: lib/Transforms/Utils/LoopUnroll.cpp
>> ===================================================================
>> --- lib/Transforms/Utils/LoopUnroll.cpp
>> +++ lib/Transforms/Utils/LoopUnroll.cpp
>> @@ -218,10 +218,16 @@
>> bool CompletelyUnroll = Count == TripCount;
>> SmallVector<BasicBlock *, 4> ExitBlocks;
>> L->getExitBlocks(ExitBlocks);
>> - Loop *ParentL = L->getParentLoop();
>> - bool AllExitsAreInsideParentLoop = !ParentL ||
>> - std::all_of(ExitBlocks.begin(), ExitBlocks.end(),
>> - [&](BasicBlock *BB) { return ParentL->contains(BB); });
>> +
>> + // Go through all exits of L and see if there are any phi-nodes there. We just
>> + // conservatively assume that they're inserted to preserve LCSSA form, which
>> + // means that complete unrolling might break this form. We need to either fix
>> + // it in-place after the transformation, or entirely rebuild LCSSA. TODO: For
>> + // now we just recompute LCSSA for the outer loop, but it should be possible
>> + // to fix it in-place.
>> + bool NeedToFixLCSSA = CompletelyUnroll &&
>> + std::any_of(ExitBlocks.begin(), ExitBlocks.end(),
>> + [&](BasicBlock *BB) { return isa<PHINode>(BB->begin()); });
>>
>> // We assume a run-time trip count if the compiler cannot
>> // figure out the loop trip count and the unroll-runtime
>> @@ -308,6 +314,7 @@
>> LoopBlocksDFS::RPOIterator BlockBegin = DFS.beginRPO();
>> LoopBlocksDFS::RPOIterator BlockEnd = DFS.endRPO();
>>
>> + std::vector<BasicBlock*> UnrolledLoopBlocks = L->getBlocks();
>> for (unsigned It = 1; It != Count; ++It) {
>> std::vector<BasicBlock*> NewBlocks;
>> SmallDenseMap<const Loop *, Loop *, 4> NewLoops;
>> @@ -387,6 +394,7 @@
>> Latches.push_back(New);
>>
>> NewBlocks.push_back(New);
>> + UnrolledLoopBlocks.push_back(New);
>> }
>>
>> // Remap all instructions in the most recent iteration
>> @@ -476,8 +484,12 @@
>> if (Term->isUnconditional()) {
>> BasicBlock *Dest = Term->getSuccessor(0);
>> if (BasicBlock *Fold = FoldBlockIntoPredecessor(Dest, LI, SE,
>> - ForgottenLoops))
>> + ForgottenLoops)) {
>> std::replace(Latches.begin(), Latches.end(), Dest, Fold);
>> + UnrolledLoopBlocks.erase(std::remove(UnrolledLoopBlocks.begin(),
>> + UnrolledLoopBlocks.end(), Dest),
>> + UnrolledLoopBlocks.end());
>> + }
>> }
>> }
>>
>> @@ -530,15 +542,38 @@
>> if (CompletelyUnroll)
>> LI->markAsRemoved(L);
>>
>> + // After complete unrolling most of the blocks should be contained in OuterL.
>> + // However, some of them might happen to be out of OuterL (e.g. if they
>> + // precede a loop exit). In this case we might need to insert PHI nodes in
>> + // order to preserve LCSSA form.
>> + // We don't need to check this if we already know that we need to fix LCSSA
>> + // form.
>> + // TODO: For now we just recompute LCSSA for the outer loop in this case, but
>> + // it should be possible to fix it in-place.
>> + if (OuterL && CompletelyUnroll && !NeedToFixLCSSA) {
>> + for (unsigned i = 0; i < UnrolledLoopBlocks.size(); ++i) {
>> + BasicBlock *BB = UnrolledLoopBlocks[i];
>> + if (LI->getLoopFor(BB) == OuterL)
>> + continue;
>> + for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) {
>> + for (unsigned OpI = 0, OpE = I->getNumOperands(); OpI < OpE; ++OpI) {
>> + if (auto Def = dyn_cast<Instruction>(I->getOperand(OpI)))
>
> I think you can use ranged for for these loops:
>
> for (Instruction &Inst : BB) {
> for (Use &U : BB.operands()) {
> if (auto Def = dyn_cast<Instruction>(U))
> …
Good point. Probably, even all_of/any_of would work here, I’ll fix this.
>
>> + if (LI->getLoopFor(Def->getParent()) == OuterL)
>> + NeedToFixLCSSA = true;
>
> We should be able to break out early here, no?
Definitely, thanks!
>
>> + }
>> + }
>> + }
>> + }
>> +
>> // If we have a pass and a DominatorTree we should re-simplify impacted loops
>> // to ensure subsequent analyses can rely on this form. We want to simplify
>> // at least one layer outside of the loop that was unrolled so that any
>> // changes to the parent loop exposed by the unrolling are considered.
>> if (DT) {
>> if (!OuterL && !CompletelyUnroll)
>> OuterL = L;
>> if (OuterL) {
>> - bool Simplified = simplifyLoop(OuterL, DT, LI, SE, AC, PreserveLCSSA);
>> + simplifyLoop(OuterL, DT, LI, SE, AC, PreserveLCSSA);
>
> Can simplifyLoop return false when NeedToFixLCSSA = true? If so, do we
> still need to fix LCSSA?
Previously I, for some reason, thought that simplification might break LCSSA, so we have to rebuild it after it, but this time I haven’t found such a case. I’ll examine what we can do there and update the patch accordingly if needed.
>
>>
>> // LCSSA must be performed on the outermost affected loop. The unrolled
>> // loop's last loop latch is guaranteed to be in the outermost loop after
>> @@ -548,7 +583,7 @@
>> while (OuterL->getParentLoop() != LatchLoop)
>> OuterL = OuterL->getParentLoop();
>>
>> - if (CompletelyUnroll && (!AllExitsAreInsideParentLoop || Simplified))
>> + if (CompletelyUnroll && NeedToFixLCSSA)
>
> CompletelyUnroll is kind of redundant here, since there's no path where
> we set NeedToFixLCSSA = true where we didn't check CompletelyUnroll
> already.
True, will fix!
>
>> formLCSSARecursively(*OuterL, *DT, LI, SE);
>> else
>> assert(OuterL->isLCSSAForm(*DT) &&
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160203/ef4bedc3/attachment-0001.html>
More information about the llvm-commits
mailing list