[PATCH] D34200: [PM/unswitch] Teach SimpleLoopUnswitch to do non-trivial unswitching, making it no longer even remotely simple.

Sun Jun 25 16:37:48 PDT 2017

sanjoy added a comment.

I did a quick scan and have a couple of low-level comments inline.

Do you think it is possible to split out the `LoopInfo` updating logic to a later patch, perhaps by using something more brute force initially?  That will make later rounds of review a lot easier.

Finally, maybe I missed something, but I did not see the code handle something like this:

  loop:
    br i1 %us_cond, label %left, label %right

  left:
    br i1 %c, label %right, label %be

  right:
    br i1 %c, label %left, label %be

  be:
    br label %loop

where unswitching on `%us_cond` will create two subloops "out of thin air" by destroying unstructured control flow.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:837
+  SmallVector<BasicBlock *, 16> Worklist;
+  SmallPtrSet<BasicBlock *, 16> ClonedLoopSet;
+  for (auto *Pred : predecessors(ClonedHeader)) {
----------------
I'd call this `ClonedLoopBlocksSet` to make it clear that this holds `BasicBlock *`s not `Loop *`s.  Otherwise later `ClonedLoopSet.count(ClonedBB)` reads weird.  Ditto for `PossibleClonedLoopSet`.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1110
+      ExitLoops.push_back(ExitL);
+      ExitsInLoops.push_back(ExitBB);
+      if (!ParentL || (ParentL != ExitL && ParentL->contains(ExitL)))
----------------
Can this be common'ed up with the logic in `buildClonedLoops`?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1165
+    // reverse.
+    LoopSet.insert(Header);
+    while (!Worklist.empty()) {
----------------
I'd suggest calling this `LoopBlocksSet`.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1398
+///
+/// It also updates both dominator tree and loopinfo based on the unswitching.
+///
----------------
The commit message says you don't update the dom tree?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1461
+  SmallPtrSet<BasicBlock *, 4> SkippedLoopAndExitBlocks;
+  if (ContinueSuccBB->getUniquePredecessor() ||
+      llvm::all_of(predecessors(ContinueSuccBB), [&](BasicBlock *PredBB) {
----------------
Again, I think you can just directly ask the domtree for this.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1606
+  // Update any PHI nodes in the successors of the skipped blocks to not have
+  // spurrious incoming values.
+  for (auto *LoopBB : L.blocks())
----------------
s/spurrious/spurious/

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1740
+/// quadratic.
+int computeDomSubtreeCost(DomTreeNode &N,
+                          const SmallDenseMap<BasicBlock *, int, 4> &BBCostMap,
----------------
`static`?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1828
+  for (auto *BB : L.blocks()) {
+    int &Cost = BBCostMap[BB];
+    for (auto &I : *BB) {
----------------
I'd be a bit more comfortable if you did `int Const = 0`, and later `BBCostMap[BB] = Cost;`.  Right now it isn't obvious that `Cost += TTI.getUserCost(&I);` is modifying a densemap value.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1851
+  // 1) An unswitching cost below the threshold
+  // 2) The smallest number of duplicated unswitch candidates (to avoid
+  //    creating redundant subsequent unswitching)
----------------
I may be missing something, but I only see (1).  Also, a one liner on what "unswitching cost" and "cost after unswitching" mean will be helpful.

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1871
+      // Don't count successors more than once.
+      if (!Visited.insert(SuccBB).second)
+        continue;
----------------
Right now this is always true, right (since you're only unswitching `br` instructions that have different successors)?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1879
+      if (SuccBB->getUniquePredecessor() ||
+          llvm::all_of(predecessors(SuccBB), [&](BasicBlock *PredBB) {
+            return PredBB == &BB || DT.dominates(SuccBB, PredBB);
----------------
Isn't this the same as edge domination?

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1901
+                 << " for unswitch candidate: " << *CandidateTI << "\n");
+    if (!UnswitchTI || CandidateCost < UnswitchCost) {
+      UnswitchTI = CandidateTI;
----------------
Aren't you reading uninitialized memory here (for `UnswitchCost`)?

I'd also name `UnswitchCost` in a way that indicates it is the "best so far".

================
Comment at: lib/Transforms/Scalar/SimpleLoopUnswitch.cpp:1907
+
+  if (UnswitchCost < UnswitchThreshold) {
+    DEBUG(dbgs() << "  Trying to unswitch non-trivial (cost = " << UnswitchCost
----------------
How about putting all of the logic up to this point into a `TerminatorInst *findNonTrivialUnswitchCandidate` function?

https://reviews.llvm.org/D34200