[PATCH] D22630: Loop rotation

Mon Aug 15 13:20:46 PDT 2016

hiraditya marked 14 inline comments as done.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:54-60
@@ -45,5 +53,9 @@
 
-static cl::opt<unsigned> DefaultRotationThreshold(
+static cl::opt<unsigned> RotationMaxHeaderSize(
     "rotation-max-header-size", cl::init(16), cl::Hidden,
     cl::desc("The default maximum header size for automatic loop rotation"));
 
+static cl::opt<unsigned> RotationMaxSize(
+    "rotation-max-size", cl::init(100), cl::Hidden,
+    cl::desc("The default maximum loop size for automatic loop rotation"));
+
----------------
mzolotukhin wrote:
> Why do we need both of these options? How do they interact with each other? Isn't the first one just a leftover from the previous implementation and should be replaced with the second one in the new implementation?
You are right, the first one is leftover from the old implementation. I'll delete that.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:62-64
@@ +61,5 @@
+
+static cl::opt<int> MaxLoopsRotated(
+    "max-loops-rotated", cl::init(-1), cl::Hidden,
+    cl::desc("The maximum loops to be rotated (-1 means no limit)"));
+
----------------
mzolotukhin wrote:
> How is this intended to be used?
For delta-debugging. This patch actually exposed a bug in MachineBlockPlacement (https://llvm.org/bugs/show_bug.cgi?id=28604). where we needed this flag. I'll remove this once the patch is approved.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:69-71
@@ -50,2 +68,5 @@
+
+static int LoopsRotated = 0;
+
 STATISTIC(NumRotated, "Number of loops rotated");
 
----------------
mzolotukhin wrote:
> Why do we have both `LoopsRotated` and `NumRotated`?
I think STATISTICS don't appear in release builds.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:73-88
@@ -51,2 +72,18 @@
 
-namespace {
+static void insertBetween(BasicBlock *NewBB, BasicBlock *PredBefore,
+                   BasicBlock *Succ) {
+  BranchInst *NewBI = BranchInst::Create(Succ, NewBB);
+  NewBI->setDebugLoc(PredBefore->getTerminator()->getDebugLoc());
+
+  BranchInst *BLI = dyn_cast<BranchInst>(PredBefore->getTerminator());
+  for (unsigned I = 0, E = BLI->getNumSuccessors(); I < E; ++I)
+    if (BLI->getSuccessor(I) == Succ) {
+      BLI->setSuccessor(I, NewBB);
+      break;
+    }
+  // Move NewBB physically from the end of the block list.
+  Function *F = Succ->getParent();
+  F->getBasicBlockList().splice(Succ->getIterator(), F->getBasicBlockList(),
+                                NewBB);
+}
+
----------------
mzolotukhin wrote:
> `llvm::SplitEdge` should be sufficient here. It'll create an empty block between the given two (and that's the only current usage of `insertBetween`).
I tried and first-hand it didn't work. So I'll look into it in more detail and update on this.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:182-183
@@ -216,5 +181,4 @@
   // duplicate blocks inside it.
-  {
-    SmallPtrSet<const Value *, 32> EphValues;
-    CodeMetrics::collectEphemeralValues(L, AC, EphValues);
-
+  if (Exits.size() > MaxExits)
+    return false;
+  SmallPtrSet<const Value *, 32> EphValues;
----------------
mzolotukhin wrote:
> Why does the number of exits affect the decision whether we should rotate or not?
This is just providing an extra tuning for now. I can remove this if it does not appear useful.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:231
@@ -241,5 +230,3 @@
 
-  // If the loop could not be converted to canonical form, it must have an
-  // indirectbr in it, just give up.
-  if (!OrigPreheader)
-    return false;
+      for (auto UI = Inst.use_begin(), E = Inst.use_end(); UI != E;) {
+        Use &U = *UI++;
----------------
mzolotukhin wrote:
> Range loop?
It will become tricky with range-loop because UI is incremented at the beginning.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:263-264
@@ -331,56 +262,4 @@
 
-  // Along with all the other instructions, we just cloned OrigHeader's
-  // terminator into OrigPreHeader. Fix up the PHI nodes in each of OrigHeader's
-  // successors by duplicating their incoming values for OrigHeader.
-  TerminatorInst *TI = OrigHeader->getTerminator();
-  for (BasicBlock *SuccBB : TI->successors())
-    for (BasicBlock::iterator BI = SuccBB->begin();
-         PHINode *PN = dyn_cast<PHINode>(BI); ++BI)
-      PN->addIncoming(PN->getIncomingValueForBlock(OrigHeader), OrigPreheader);
-
-  // Now that OrigPreHeader has a clone of OrigHeader's terminator, remove
-  // OrigPreHeader's old terminator (the original branch into the loop), and
-  // remove the corresponding incoming values from the PHI nodes in OrigHeader.
-  LoopEntryBranch->eraseFromParent();
-
-  // If there were any uses of instructions in the duplicated block outside the
-  // loop, update them, inserting PHI nodes as required
-  RewriteUsesOfClonedInstructions(OrigHeader, OrigPreheader, ValueMap);
-
-  // NewHeader is now the header of the loop.
-  L->moveToHeader(NewHeader);
-  assert(L->getHeader() == NewHeader && "Latch block is our new header");
-
-  // At this point, we've finished our major CFG changes.  As part of cloning
-  // the loop into the preheader we've simplified instructions and the
-  // duplicated conditional branch may now be branching on a constant.  If it is
-  // branching on a constant and if that constant means that we enter the loop,
-  // then we fold away the cond branch to an uncond branch.  This simplifies the
-  // loop in cases important for nested loops, and it also means we don't have
-  // to split as many edges.
-  BranchInst *PHBI = cast<BranchInst>(OrigPreheader->getTerminator());
-  assert(PHBI->isConditional() && "Should be clone of BI condbr!");
-  if (!isa<ConstantInt>(PHBI->getCondition()) ||
-      PHBI->getSuccessor(cast<ConstantInt>(PHBI->getCondition())->isZero()) !=
-          NewHeader) {
-    // The conditional branch can't be folded, handle the general case.
-    // Update DominatorTree to reflect the CFG change we just made.  Then split
-    // edges as necessary to preserve LoopSimplify form.
-    if (DT) {
-      // Everything that was dominated by the old loop header is now dominated
-      // by the original loop preheader. Conceptually the header was merged
-      // into the preheader, even though we reuse the actual block as a new
-      // loop latch.
-      DomTreeNode *OrigHeaderNode = DT->getNode(OrigHeader);
-      SmallVector<DomTreeNode *, 8> HeaderChildren(OrigHeaderNode->begin(),
-                                                   OrigHeaderNode->end());
-      DomTreeNode *OrigPreheaderNode = DT->getNode(OrigPreheader);
-      for (unsigned I = 0, E = HeaderChildren.size(); I != E; ++I)
-        DT->changeImmediateDominator(HeaderChildren[I], OrigPreheaderNode);
-
-      assert(DT->getNode(Exit)->getIDom() == OrigPreheaderNode);
-      assert(DT->getNode(NewHeader)->getIDom() == OrigPreheaderNode);
-
-      // Update OrigHeader to be dominated by the new header block.
-      DT->changeImmediateDominator(OrigHeader, OrigLatch);
-    }
+// Add incoming values to the (already present) PHIs of NewH.
+void LoopRotate::adjustNewHeaderPhis(ValueToValueMapTy &VMap, BasicBlock *NewH,
+                                     BasicBlock *NewPH) {
----------------
mzolotukhin wrote:
> What's a reasoning behind separating this function from `addPhisToNewHeader`? They seem to accomplish very related tasks. And they are called one after the other.
One works on existing phi's and another on new-phis to be inserted. I have renamed the function.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:312
@@ +311,3 @@
+/// Rotate loop L.
+/// TODO: arrange newly inserted bbs.
+void LoopRotate::rotateLoop(BasicBlock *NewH,
----------------
mzolotukhin wrote:
> Is this TODO done? If not, I think it's needed to be done before committing :-)
The BBs are aranged but it is still not clear what would be a good arrangement. Suggestions are welcome!.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:378
@@ +377,3 @@
+      BasicBlock *NewBB = cast<BasicBlock>(VMap[BB]);
+      // VERIFY: NewIDom will be correct because this part of CFG is up-to-date.
+      BasicBlock *NewIDom = DT->findNearestCommonDominator(StaleIDom, NewBB);
----------------
mzolotukhin wrote:
> Is it a TODO item?
This was done already. Thanks

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:430-431
@@ +429,4 @@
+
+  if (!isProfitableToRotate(OrigH, Blocks, Exits))
+    return false;
+
----------------
mzolotukhin wrote:
> I'm actually not sure if loop-rotation should speculate about profitability. It's an enabler pass not an optimization. It should always rotate loops to an easily-digestible for other passes form, and do that whenever it's doable under the budget of allowed size increase.
You are right. I just wanted to have a mechanism to control loop-rotation. Can we moved this (profitability) to passes which might invoke loop-rotation. I still think loop-rotation should be called on demand. What do you think.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:447
@@ +446,3 @@
+    DEBUG(dbgs() << "\nLoop in canonical form:"; L->dumpVerbose(););
+    // After splitting the blocks need to be recollected.
+    Blocks.clear();
----------------
mzolotukhin wrote:
> Can we check this before we collect the blocks first time?
We can, however, in this case we split-edge only when it is required. And without splittingthe edge in this case collecting the blocks is not useful.
I can try to optimize it further (by only collecting the new blocks in this case and inserting at the right position).

================
Comment at: llvm/test/Transforms/LoopRotate/loop-rotate-4.ll:56-64
@@ +55,11 @@
+    i32 26, label %if.then130
+    i32 24, label %if.then130
+    i32 23, label %if.then130
+    i32 22, label %if.then130
+    i32 20, label %if.then130
+    i32 19, label %if.then130
+    i32 18, label %if.then130
+    i32 12, label %if.then130
+    i32 6, label %if.then149
+    i32 30, label %if.then467
+  ]
----------------
mzolotukhin wrote:
> `bugpoint` has known limitations, and this test is definitely might be reduced further. E.g. it runs some form of  `simplifycfg` on the test to remove unneeded blocks and merge some other blocks, but if the test starts to pass after it, `bugpoint` undoes the simplification. But it's still doable to do similar simplifications manually (or even better - to teach `bugpoint` to do it), one step at a time.
> 
> If you do this, the tests would become much more comprehensible, it would be easier to understand which case they exercise, and it would be easier to spot duplicates among the tests.
> 
> Also, metadata is usually unnecessary in such tests, I prefer to remove it so that it doesn't clutter the test (it's not about this particular test, but I saw it in some other).
I reduced it further.


https://reviews.llvm.org/D22630