[PATCH] D29620: [JumpThreading] Thread through guards

Wed Feb 8 14:09:56 PST 2017

sanjoy requested changes to this revision.
sanjoy added a comment.
This revision now requires changes to proceed.

Looks almost good to go.  I have some minor comments.

================
Comment at: lib/Transforms/Scalar/JumpThreading.cpp:2060
+    return false;
+
+  BasicBlock *Pred1 = Predecessors[0];
----------------
mkazantsev wrote:
> sanjoy wrote:
> > I think you also need to check that these preds have only one successor.
> This is not required. Imagine
>          A
>        /    \
>       B      C
>     /  \    /  \
>   D      E       F
> 
> 
> Hoisting instructions before guards to split blocks between B-E, C-E looks valid. We don't break any look structures in this case. It is kinda equivalent to splitting E into 2 parts (E1 and E2) and then duplicating E1 to E1a and E1b. As result, we have something like
> 
>           A
>        /      \
>       B         C
>     /  \      /  \
>   D     E1a E1b   F
>           \ /
>            E2
> 
> That looks valid for me. Please let me know if you see a problem in doing so.
SGTM!

================
Comment at: lib/Transforms/Scalar/JumpThreading.cpp:46
 using namespace jumpthreading;
+using namespace PatternMatch;

----------------
The general idiom I've seen used in LLVM is to open the `PatternMatch` namespace in the functions that use it (so for here only `JumpThreadingPass::ProcessGuards`).

================
Comment at: lib/Transforms/Scalar/JumpThreading.cpp:719
+  // Look if we can propagate guards to predecessors.
+  if (ProcessGuards(BB))
+    return true;
----------------
As an optimization for frontends which do not emit guards, I think we should make this optimization conditional under the Module having a `llvm.experimental_guard` declaration that has uses (see `HasGuards` is ScalarEvolution, for instance).

================
Comment at: lib/Transforms/Scalar/JumpThreading.cpp:2064
+  // TODO: Look up deeper than to immediate predecessor?
+  for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
+    if (match(&*I, m_Intrinsic<Intrinsic::experimental_guard>()))
----------------
Use a range for here `for (auto &I : *BB)`.

================
Comment at: lib/Transforms/Scalar/JumpThreading.cpp:2066
+    if (match(&*I, m_Intrinsic<Intrinsic::experimental_guard>()))
+      if (auto Parent = Pred1->getSinglePredecessor())
+        if (Parent == Pred2->getSinglePredecessor())
----------------
Can we do these three checks on before we loop over the instructions in `BB`?  If they don't pass then there's no need to look for guards.

```
if (auto Parent = Pred1->getSinglePredecessor())
  if (Parent == Pred2->getSinglePredecessor())
    if (BranchInst *BI = dyn_cast<BranchInst>(Parent->getTerminator()))
```

================
Comment at: lib/Transforms/Scalar/JumpThreading.cpp:2095
+    // False dest is safe if !BranchCond => GuardCond.
+    Impl = isImpliedCondition(BranchCond, GuardCond, DL, true);
+    if (Impl && *Impl)
----------------
Add a `/*InvertAPred*/` marker here.

================
Comment at: lib/Transforms/Scalar/JumpThreading.cpp:2109
+  // Duplicate all instructions before the guard in the unguarded branch.
+  UnguardedBlock =
+      DuplicateInstructionsInSplitBetween(BB, UnguardedBlock, Guard,
----------------
Where is `BBDupThreshold` defined?  That needs to be a `cl::opt`.

================
Comment at: lib/Transforms/Scalar/JumpThreading.cpp:2118
+  GuardedBlock =
+      DuplicateInstructionsInSplitBetween(BB, GuardedBlock, AfterGuard,
+                                          BBDupThreshold, GuardedMapping);
----------------
Right now it is possible for this transform to bail out after cloning some instructions if the first `DuplicateInstructionsInSplitBetween` passes but the second one fails.  I'd restructure this a little differently:  I'd first clone `GuardedBlock` and then `UnguardedBlock`; with an assert checking that if cloning `GuardedBlock` was successful, so was `UnguardedBlock`, since the latter clones fewer instructions.

================
Comment at: lib/Transforms/Scalar/JumpThreading.cpp:2131
+  for (auto BI = BB->begin(); &*BI != AfterGuard; ++BI)
+    ToRemove.push_back(&*BI);
+
----------------
Can we filter out PHI nodes here?  That is:

```
for (auto BI = BB->begin(); &*BI != AfterGuard; ++BI)
  if (!isa<PHINode>(&*BI))
    ToRemove.push_back(&*BI);
```

================
Comment at: lib/Transforms/Scalar/JumpThreading.cpp:2134
+  // Substitute with Phis & remove.
+  for (auto I = ToRemove.rbegin(), E = ToRemove.rend(); I != E; ++I) {
+    Instruction *Inst = *I;
----------------
You can do:

```
for (auto *Inst : reverse(ToRemove))
```

================
Comment at: lib/Transforms/Scalar/JumpThreading.cpp:2140
+      NewPN->addIncoming(GuardedMapping[Inst], GuardedBlock);
+      NewPN->insertBefore(Inst);
+      Inst->replaceAllUsesWith(NewPN);
----------------
Stylistically, I'd have preferred `NewPN->insertBefore(BB->getFirstInsertionPt());` (with the call to `getFirstInsertionPt()` hoisted out of the loop to avoid wasted work).

What you have here is correct though, so I'll leave it to you to make a judgement call.

================
Comment at: lib/Transforms/Utils/CloneFunction.cpp:751
+
+/// \brief Duplicate non-Phi instructions from the beginning of block up to
+/// StopAt instruction into a split block between BB and its predecessor.
----------------
Add a note that (here or in the header) you're not simply dropping PHIs, but are specializing them as necessary.  IOW, this function will do the thing you expect with uses of PHI nodes in the cloned instructions.

================
Comment at: lib/Transforms/Utils/CloneFunction.cpp:754
+BasicBlock *llvm::DuplicateInstructionsInSplitBetween(
+    BasicBlock *BB, BasicBlock *PredBB, Instruction *StopAt,
+    unsigned MaxDuplicatedInstructions,
----------------
This needs a unit test in unittests/Transforms/Utils/Cloning.cpp

================
Comment at: lib/Transforms/Utils/CloneFunction.cpp:765
+    Instruction *Inst = &*BI;
+    if (dyn_cast<PHINode>(BI))
+      continue;
----------------
Use `isa<>` here.

================
Comment at: lib/Transforms/Utils/CloneFunction.cpp:767
+      continue;
+    if (auto CI = dyn_cast<CallInst>(Inst))
+      if (CI->cannotDuplicate())
----------------
Use `auto *CI`: http://llvm.org/docs/CodingStandards.html#beware-unnecessary-copies-with-auto

================
Comment at: lib/Transforms/Utils/CloneFunction.cpp:769
+      if (CI->cannotDuplicate())
+        return false;
+    if (++InstructionsToDuplicate > MaxDuplicatedInstructions)
----------------
I think you need to bail out here on an instruction with uses that returns a value of token type (otherwise you may have to insert PHIs of token type later).

https://reviews.llvm.org/D29620