[PATCH] D22630: Loop rotation

Fri Aug 5 18:08:06 PDT 2016

mzolotukhin requested changes to this revision.
mzolotukhin added a reviewer: mzolotukhin.
mzolotukhin added a comment.
This revision now requires changes to proceed.

Hi Aditya,

Thanks for addressing my feedback, please more some comments below.

Michael


================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:17
@@ -11,2 +16,3 @@
 //
+// TODO: Const correctness.
 //===----------------------------------------------------------------------===//
----------------
Can you please expand this comment a little?

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:54-60
@@ -45,5 +53,9 @@
 
-static cl::opt<unsigned> DefaultRotationThreshold(
+static cl::opt<unsigned> RotationMaxHeaderSize(
     "rotation-max-header-size", cl::init(16), cl::Hidden,
     cl::desc("The default maximum header size for automatic loop rotation"));
 
+static cl::opt<unsigned> RotationMaxSize(
+    "rotation-max-size", cl::init(100), cl::Hidden,
+    cl::desc("The default maximum loop size for automatic loop rotation"));
+
----------------
Why do we need both of these options? How do they interact with each other? Isn't the first one just a leftover from the previous implementation and should be replaced with the second one in the new implementation?

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:62-64
@@ +61,5 @@
+
+static cl::opt<int> MaxLoopsRotated(
+    "max-loops-rotated", cl::init(-1), cl::Hidden,
+    cl::desc("The maximum loops to be rotated (-1 means no limit)"));
+
----------------
How is this intended to be used?

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:69-71
@@ -50,2 +68,5 @@
+
+static int LoopsRotated = 0;
+
 STATISTIC(NumRotated, "Number of loops rotated");
 
----------------
Why do we have both `LoopsRotated` and `NumRotated`?

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:73-88
@@ -51,2 +72,18 @@
 
-namespace {
+static void insertBetween(BasicBlock *NewBB, BasicBlock *PredBefore,
+                   BasicBlock *Succ) {
+  BranchInst *NewBI = BranchInst::Create(Succ, NewBB);
+  NewBI->setDebugLoc(PredBefore->getTerminator()->getDebugLoc());
+
+  BranchInst *BLI = dyn_cast<BranchInst>(PredBefore->getTerminator());
+  for (unsigned I = 0, E = BLI->getNumSuccessors(); I < E; ++I)
+    if (BLI->getSuccessor(I) == Succ) {
+      BLI->setSuccessor(I, NewBB);
+      break;
+    }
+  // Move NewBB physically from the end of the block list.
+  Function *F = Succ->getParent();
+  F->getBasicBlockList().splice(Succ->getIterator(), F->getBasicBlockList(),
+                                NewBB);
+}
+
----------------
`llvm::SplitEdge` should be sufficient here. It'll create an empty block between the given two (and that's the only current usage of `insertBetween`).

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:91
@@ +90,3 @@
+// Remove the arguments of all phi nodes in PhiBB coming from block From.
+
+static void discardIncomingValues(BasicBlock *PhiBB, BasicBlock *From) {
----------------
Unnecessary blank line.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:93-94
@@ +92,4 @@
+static void discardIncomingValues(BasicBlock *PhiBB, BasicBlock *From) {
+  for (BasicBlock::iterator I = PhiBB->begin(),
+       E = PhiBB->end(); I != E; ++I) {
+    PHINode *PN = dyn_cast<PHINode>(I);
----------------
You could use a range loop here:  `for (Instruction &I : *PhiBB)`.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:182-183
@@ -216,5 +181,4 @@
   // duplicate blocks inside it.
-  {
-    SmallPtrSet<const Value *, 32> EphValues;
-    CodeMetrics::collectEphemeralValues(L, AC, EphValues);
-
+  if (Exits.size() > MaxExits)
+    return false;
+  SmallPtrSet<const Value *, 32> EphValues;
----------------
Why does the number of exits affect the decision whether we should rotate or not?

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:187
@@ -221,1 +186,3 @@
+  unsigned LoopSize = MaxHeaderSize;
+  for(const BasicBlock *BB : Blocks) {
     CodeMetrics Metrics;
----------------
Please run clang-format on the patch. E.g. in this line a space before `(` is missing.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:192
@@ +191,3 @@
+    // indirectbr, invoke in the loop but we can cut the cloning part at that point
+    // and that will be the last exit BB
+    if(Metrics.notDuplicatable) {
----------------
Please end comments with a dot.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:231
@@ -241,5 +230,3 @@
 
-  // If the loop could not be converted to canonical form, it must have an
-  // indirectbr in it, just give up.
-  if (!OrigPreheader)
-    return false;
+      for (auto UI = Inst.use_begin(), E = Inst.use_end(); UI != E;) {
+        Use &U = *UI++;
----------------
Range loop?

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:263-264
@@ -331,56 +262,4 @@
 
-  // Along with all the other instructions, we just cloned OrigHeader's
-  // terminator into OrigPreHeader. Fix up the PHI nodes in each of OrigHeader's
-  // successors by duplicating their incoming values for OrigHeader.
-  TerminatorInst *TI = OrigHeader->getTerminator();
-  for (BasicBlock *SuccBB : TI->successors())
-    for (BasicBlock::iterator BI = SuccBB->begin();
-         PHINode *PN = dyn_cast<PHINode>(BI); ++BI)
-      PN->addIncoming(PN->getIncomingValueForBlock(OrigHeader), OrigPreheader);
-
-  // Now that OrigPreHeader has a clone of OrigHeader's terminator, remove
-  // OrigPreHeader's old terminator (the original branch into the loop), and
-  // remove the corresponding incoming values from the PHI nodes in OrigHeader.
-  LoopEntryBranch->eraseFromParent();
-
-  // If there were any uses of instructions in the duplicated block outside the
-  // loop, update them, inserting PHI nodes as required
-  RewriteUsesOfClonedInstructions(OrigHeader, OrigPreheader, ValueMap);
-
-  // NewHeader is now the header of the loop.
-  L->moveToHeader(NewHeader);
-  assert(L->getHeader() == NewHeader && "Latch block is our new header");
-
-  // At this point, we've finished our major CFG changes.  As part of cloning
-  // the loop into the preheader we've simplified instructions and the
-  // duplicated conditional branch may now be branching on a constant.  If it is
-  // branching on a constant and if that constant means that we enter the loop,
-  // then we fold away the cond branch to an uncond branch.  This simplifies the
-  // loop in cases important for nested loops, and it also means we don't have
-  // to split as many edges.
-  BranchInst *PHBI = cast<BranchInst>(OrigPreheader->getTerminator());
-  assert(PHBI->isConditional() && "Should be clone of BI condbr!");
-  if (!isa<ConstantInt>(PHBI->getCondition()) ||
-      PHBI->getSuccessor(cast<ConstantInt>(PHBI->getCondition())->isZero()) !=
-          NewHeader) {
-    // The conditional branch can't be folded, handle the general case.
-    // Update DominatorTree to reflect the CFG change we just made.  Then split
-    // edges as necessary to preserve LoopSimplify form.
-    if (DT) {
-      // Everything that was dominated by the old loop header is now dominated
-      // by the original loop preheader. Conceptually the header was merged
-      // into the preheader, even though we reuse the actual block as a new
-      // loop latch.
-      DomTreeNode *OrigHeaderNode = DT->getNode(OrigHeader);
-      SmallVector<DomTreeNode *, 8> HeaderChildren(OrigHeaderNode->begin(),
-                                                   OrigHeaderNode->end());
-      DomTreeNode *OrigPreheaderNode = DT->getNode(OrigPreheader);
-      for (unsigned I = 0, E = HeaderChildren.size(); I != E; ++I)
-        DT->changeImmediateDominator(HeaderChildren[I], OrigPreheaderNode);
-
-      assert(DT->getNode(Exit)->getIDom() == OrigPreheaderNode);
-      assert(DT->getNode(NewHeader)->getIDom() == OrigPreheaderNode);
-
-      // Update OrigHeader to be dominated by the new header block.
-      DT->changeImmediateDominator(OrigHeader, OrigLatch);
-    }
+// Add incoming values to the (already present) PHIs of NewH.
+void LoopRotate::adjustNewHeaderPhis(ValueToValueMapTy &VMap, BasicBlock *NewH,
+                                     BasicBlock *NewPH) {
----------------
What's a reasoning behind separating this function from `addPhisToNewHeader`? They seem to accomplish very related tasks. And they are called one after the other.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:284
@@ +283,3 @@
+  BasicBlock *NewH = nullptr;
+  for (auto I = df_begin(OrigH), E = df_end(OrigH); I != E;) {
+    if (!L->contains(*I)) {
----------------
`I` is usually used for instruction. Can we use another name here?

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:311
@@ -476,12 +310,3 @@
 
-/// Determine whether the instructions in this range may be safely and cheaply
-/// speculated. This is not an important enough situation to develop complex
-/// heuristics. We handle a single arithmetic instruction along with any type
-/// conversions.
-static bool shouldSpeculateInstrs(BasicBlock::iterator Begin,
-                                  BasicBlock::iterator End, Loop *L) {
-  bool seenIncrement = false;
-  bool MultiExitLoop = false;
-
-  if (!L->getExitingBlock())
-    MultiExitLoop = true;
+/// Rotate loop L.
+/// TODO: arrange newly inserted bbs.
----------------
This comment is useless in the current form. Can we have an expanded version of it instead? Like, what loops can we rotate, what form we expect them to be in, which blocks are copied etc.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:312
@@ +311,3 @@
+/// Rotate loop L.
+/// TODO: arrange newly inserted bbs.
+void LoopRotate::rotateLoop(BasicBlock *NewH,
----------------
Is this TODO done? If not, I think it's needed to be done before committing :-)

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:338
@@ -533,2 +337,3 @@
+      BeforeLoop = BB;
       break;
     }
----------------
I'm not sure I understood your points in a correct way (due to my poor English, sorry), but I believe that loop-rotation executed several times on the same loop should give the same result as it produced after the first run. That is, it should rotate a loop to some optimal form, and do nothing if a loop is already in such form.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:365
@@ +364,3 @@
+  for (BasicBlock *BB : Blocks) {
+    typedef DomTreeNodeBase<BasicBlock> DTNode;
+    DTNode *IDom = DT->getNode(BB);
----------------
Do we really need a typedef for the sake of just two lines?

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:378
@@ +377,3 @@
+      BasicBlock *NewBB = cast<BasicBlock>(VMap[BB]);
+      // VERIFY: NewIDom will be correct because this part of CFG is up-to-date.
+      BasicBlock *NewIDom = DT->findNearestCommonDominator(StaleIDom, NewBB);
----------------
Is it a TODO item?

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:403-404
@@ -569,3 +402,4 @@
 
-  if (!shouldSpeculateInstrs(Latch->begin(), Jmp->getIterator(), L))
-    return false;
+  LI->verify();
+  DT->verifyDomTree();
+  assert(L->isLCSSAForm(*DT) && "Loop is not in LCSSA form.");
----------------
This should probably be under `#ifndef NDEBUG`.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:405
@@ +404,3 @@
+  DT->verifyDomTree();
+  assert(L->isLCSSAForm(*DT) && "Loop is not in LCSSA form.");
+  DEBUG(verifyFunction(*F));
----------------
I suggest using `isRecursivelyLCSSAForm` instead.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:406
@@ +405,3 @@
+  assert(L->isLCSSAForm(*DT) && "Loop is not in LCSSA form.");
+  DEBUG(verifyFunction(*F));
+  DEBUG(dbgs() << "\nLoopRotation: rotated "; L->dumpVerbose());
----------------
In general I like all sorts of verifications, but I think this is too much (and in case we need to verify this, we can schedule a `verify` pass after `loop-rotate`).

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:428
@@ +427,3 @@
+  BasicBlock *NewH = collectSEMEBlocks(OrigH, LoopLatch, Blocks, Exits);
+  assert(NewH);
+
----------------
A message in the assert is still missing (and in other asserts).

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:430-431
@@ +429,4 @@
+
+  if (!isProfitableToRotate(OrigH, Blocks, Exits))
+    return false;
+
----------------
I'm actually not sure if loop-rotation should speculate about profitability. It's an enabler pass not an optimization. It should always rotate loops to an easily-digestible for other passes form, and do that whenever it's doable under the budget of allowed size increase.

================
Comment at: llvm/lib/Transforms/Scalar/LoopRotation.cpp:447
@@ +446,3 @@
+    DEBUG(dbgs() << "\nLoop in canonical form:"; L->dumpVerbose(););
+    // After splitting the blocks need to be recollected.
+    Blocks.clear();
----------------
Can we check this before we collect the blocks first time?

================
Comment at: llvm/test/Transforms/LoopRotate/loop-rotate-4.ll:56-64
@@ +55,11 @@
+    i32 26, label %if.then130
+    i32 24, label %if.then130
+    i32 23, label %if.then130
+    i32 22, label %if.then130
+    i32 20, label %if.then130
+    i32 19, label %if.then130
+    i32 18, label %if.then130
+    i32 12, label %if.then130
+    i32 6, label %if.then149
+    i32 30, label %if.then467
+  ]
----------------
`bugpoint` has known limitations, and this test is definitely might be reduced further. E.g. it runs some form of  `simplifycfg` on the test to remove unneeded blocks and merge some other blocks, but if the test starts to pass after it, `bugpoint` undoes the simplification. But it's still doable to do similar simplifications manually (or even better - to teach `bugpoint` to do it), one step at a time.

If you do this, the tests would become much more comprehensible, it would be easier to understand which case they exercise, and it would be easier to spot duplicates among the tests.

Also, metadata is usually unnecessary in such tests, I prefer to remove it so that it doesn't clutter the test (it's not about this particular test, but I saw it in some other).


https://reviews.llvm.org/D22630