[PATCH] D45299: API to update MemorySSA for cloned blocks.

Wed Aug 22 15:33:03 PDT 2018

george.burgess.iv added a comment.

Lots of nits, and a few functional concerns. For the functional concerns that revolve around dubious edge additions, I'm happy with fixes or "it's impossible for that to actually happen; added a comment explaining that."

(If we do take the "that's impossible; have a comment" route, I assume that we're gating the new loop+MSSA transforms behind some `loop-enable-mssa`-like flag for a while to gain confidence about the correctness of this all?)

I also think we should have unittests (simple ones are perfectly OK) for the new updater functionality, please.

Like said, this review gave me a few things to do. I'll ping this with revisions when all of that's landed...

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:429
+        Instruction *DefMUDI = DefMUD->getMemoryInst();
+        assert(DefMUDI && "Found MemoryUseOrDef with no Instruction.");
+        if (Instruction *NewDefMUDI =
----------------
It's not immediately clear to me that this can't be tripped in general. e.g.

```
void foo(int j, int *i) {
  if (j == 1 || j == 2) {
    *i = 0;
  } else {
    *i = 1;
  }
}
```

I realize this isn't a loopy case, but if we decided, in our infinite wisdom, that it was profitable to have a `j == 1` block, a `j == 2` block, and an `else` block, we'd end up with both the `j == 1` and `j == 2` block having `DefMUDI == MSSA->getDefiningAccess()`, no?

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:545
+  PhiToDefMap MPhiMap;
+  if (MemoryPhi *MPhi = MSSA->getMemoryAccess(BB)) {
+    MPhiMap[MPhi] = MPhi->getIncomingValueForBlock(P1);
----------------
nit: no braces, please

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:559
+    for (const ValueToValueMapTy *const VMap :
+         make_range(ValuesBegin, ValuesEnd))
+      if (BasicBlock *NewExit = cast_or_null<BasicBlock>(VMap->lookup(Exit))) {
----------------
nit: does this line need to be wrapped?

(also, please remove the `const` in `*const`; it's inconsistent with the remainder of this file)

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:577
+    ArrayRef<BasicBlock *> ExitBlocks,
+    const SmallVectorImpl<std::unique_ptr<ValueToValueMapTy>> &VMaps,
+    DominatorTree &DT) {
----------------
nit: can this be an ArrayRef?

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:601
+
+  if (RevDeleteUpdates.size()) {
+    // Update for inserted edges: use newDT and snapshot CFG as if deletes had
----------------
nit: `if (!Rev...empty())`

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:604
+    // not occured.
+    DominatorTree NewDT(DT, RevDeleteUpdates);
+    GraphDiff<BasicBlock *> GD(RevDeleteUpdates);
----------------
nit: Is it possible/easy to update DT in-place and undo the updates? The tool I'm using to grok this says that this line always lands in CalculateWithUpdates, which appears to always call CalculateFromScratch. It sometimes gets things wrong with templated code, though, so I could be assuming the wrong thing here.

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:625
+                                          DominatorTree &DT,
+                                          GraphDiff<BasicBlock *> *GD) {
+  // Get recursive last Def, assuming well formed MSSA and updated DT.
----------------
(Can this be const, or are we updating it somewhere here?)

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:627
+  // Get recursive last Def, assuming well formed MSSA and updated DT.
+  std::function<MemoryAccess *(BasicBlock *)> GetLastDef =
+      [&](BasicBlock *BB) -> MemoryAccess * {
----------------
nit: it looks like all the recursion here is tail recursion. Is it possible/easy to rewrite this as a loop with 100% fewer `std::function::operator()` invocations? :)

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:635
+    // If BB has multiple predecessors, get last definition from IDom.
+    unsigned count = 0;
+    BasicBlock *Pred = nullptr;
----------------
nit: s/count/Count/

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:662
+  // Get nearest IDom given a set of blocks.
+  auto FindNearestCommonDominator =
+      [&](SmallPtrSetImpl<BasicBlock *> &BBSet) -> BasicBlock * {
----------------
kuhar wrote:
> I don't think this will be a bottleneck, but this can be optimized by starting the search at the node with the lowest level (highest in the tree).
(if there's no reason to add it now, please add a quick TODO with kuhar's comment in case it becomes more desirable in the future)

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:663
+  auto FindNearestCommonDominator =
+      [&](SmallPtrSetImpl<BasicBlock *> &BBSet) -> BasicBlock * {
+    BasicBlock *PrevIDom = *BBSet.begin();
----------------
nit: `const SmallPtrSetImpl<...> &`?

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:665
+    BasicBlock *PrevIDom = *BBSet.begin();
+    for (auto BB : BBSet)
+      PrevIDom = DT.findNearestCommonDominator(PrevIDom, BB);
----------------
nit: `auto *` (here and elsewhere in the patch. In general, I think the accepted style for `for` loops is to prefer `const auto &` if it's a not-pointer, and `const auto *` if it's a pointer. Dropping `const` (and `&` in the first case) as necessary)

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:689
+  // Map a BB with a set of predecessors it is being added.
+  SmallDenseMap<BasicBlock *, SmallPtrSet<BasicBlock *, 2>> AddedPredMap;
+  for (auto &Edge : Updates) {
----------------
It looks like we expect PrevPredMap and AddedPredMap to have the same set of keys, and they're sorta related. Would it make sense to have something like

```
struct PredInfo {
  SmallPtrSet<...> Added;
  SmallPtrSet<...> Prev;
};
SmallDenseMap<BasicBlock *, PredInfo> PredMap;
```

instead, so the relationship is a bit more clear?

(Or maybe one of these will need to be a SmallSetVector or similar, given the ordering concerns below?)

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:703
+    auto &PrevBlockSet = PrevPredMap[BB];
+    for (auto &Pair : children<GraphDiffInvBBPair>({GD, BB})) {
+      BasicBlock *Pi = Pair.second;
----------------
To be sure I'm understanding (I'm not overly familiar with the DT APIs), this is saying "for all predecessors of BB, but also including additions in GD, { /* loop body */ }", yeah?

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:710
+
+    if (!PrevBlockSet.size()) {
+      assert(pred_size(BB) == AddedBlockSet.size() && "Duplicate edges added.");
----------------
nit: `if (PrevBlockSet.empty())`

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:733
+    if (!MSSA->getMemoryAccess(BB))
+      MSSA->createMemoryPhi(BB);
+  }
----------------
FWIW, it looks like this'll make MSSA subtly (and in a way that doesn't matter until we print it) depend on set ordering, which may be bad for writing tests and such.

Since we can iterate over elems in PrevPredMap in any order, we'll be creating these Phis in any order. Since we assign Phis IDs in the order in which they're created ...

In this case, looks like we can just iterate directly over `Updates` and the problem goes away (assuming `Updates` is populated in a deterministic order)?

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:739
+    auto *BB = Pair.first;
+    auto &PrevBlockSet = Pair.second;
+    auto &AddedBlockSet = AddedPredMap[BB];
----------------
nit: if you don't plan on mutating the &, make it a `const &`, please (and below)

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:741
+    auto &AddedBlockSet = AddedPredMap[BB];
+    assert(PrevBlockSet.size() > 0 &&
+           "At least one previous predecessor must exist.");
----------------
nit: !empty()

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:755
+    if (NewPhi->getNumOperands()) {
+      for (auto LastDefPredPair : LastDefAddedPred)
+        for (int I = 0, E = EdgeCountMap[{LastDefPredPair.first, BB}]; I < E;
----------------
Same (potential) nondeterministic ordering concerns. 

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:760
+
+      // FIXME: May still need to check defs that no longer dominate their
+      // uses. In particular defs optimized through NewPhi?
----------------
Consider:

```
   entry
     |
     a
    / \
   |   c
    \ /
     d

a:
  ; 1 = MemoryDef(liveOnEntry)
c:
  ; 2 = MemoryDef(1)
d:
  ; 3 = MemoryPhi({c, 2}, {a, 1})
  ; 4 = MemoryDef(3); optimized to 1
```

And we add an edge from `entry` -> `d`.  I don't claim to know when/if this can reasonably happen, but ...

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:782
+      // will be replaced with DefP1.
+      NewPhi->addIncoming(DefP1, P1);
+      removeMemoryAccess(NewPhi);
----------------
Would it be more direct to just RAUW this Phi with DefP1, then remove it?

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:789
+    // other predecessors.
+    for (auto *Pred : AddedBlockSet)
+      for (int I = 0, E = EdgeCountMap[{Pred, BB}]; I < E; ++I)
----------------
Same ordering concerns. Maybe this time we could just iterate over BB's preds and fold these two loops into one? As for additions, we may need to have AddedPredMap preserve the order we observe from Updates?

(That would also let us get rid of EdgeCountMap, it seems. Assuming we don't need it when we refactor the loop above)

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:807
+           "Block with last definition must dominate idom");
+    GetNoLongerDomBlocks(PrevIDom, NewIDom, BlocksWithDefsToReplace);
+
----------------
Is it possible for BlocksWithDefsToReplace to get duplicates? If so, might it be a good idea to replace that with a SmallPtrSet?

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:816
+  ForwardIDFCalculator IDFs(DT, GD);
+  SmallPtrSet<BasicBlock *, 16> DefiningBlocks(BlocksToProcess.begin(),
+                                               BlocksToProcess.end());
----------------
(I'm assuming that there's some sort of guarantee that IDF calculations only pass these BBs into the DT. If it tries to access preds/succs of a BB, we might get badness, since the DT here may be pretending that we have CFG edges that we don't? Or did I miss us temporarily re-adding edges that were deleted between BBs?)

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:832
+  // longer dominate, replace those with the closest dominating def.
+  for (auto *BlockWithDefsToReplace : BlocksWithDefsToReplace) {
+    if (auto DefsList = MSSA->getWritableBlockDefs(BlockWithDefsToReplace)) {
----------------
Are we sure that we only need to consider BlocksWithDefsToReplace here?

I could be totally misunderstanding, but imagine the following scenario (I'm not even going to try to invent an excuse for when this could happen. :) Just wanted to know if it was considered):

```
   entry
   /  |
  a   |
 / \  |
 b c  f
 \ /  |
  d   |
   \ /
    e

f:
  ; 1 = MemoryDef(liveOnEntry)
e:
  ; 2 = MemoryPhi({d, liveOnEntry}, {f, 1})
```

Now imagine that we add an edge `f` -> `c`. AFAICT, that would cause us to add a Phi to `c` (way above) and a Phi to `d` (idf loop), but I'm failing to see where we'd update the Phi in `e` to point to the new Phi in `d`. (Since the only no-longer-dominating BB is `a`)

Similarly, if `d` had a MemoryDef in it before we added this edge, would its DefiningAccess get pointed appropriately at this new Phi in `d`?

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:835
+      for (auto &DefToReplaceUses : *DefsList) {
+        BasicBlock *DominatedBlock;
+        BasicBlock *DominatingBlock = DefToReplaceUses.getBlock();
----------------
nit: can we scope this more tightly please? e.g. `BasicBlock *DominatedBlock = UsrPhi->...` and similar in the else

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:839
+                            E = DefToReplaceUses.use_end();
+        for (; UI != E;) {
+          Use &U = *UI;
----------------
nit: Can we just do a range-for loop? Looks like we're only calling `set` on the use, which shouldn't cause any invalidation badness.

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:842
+          ++UI;
+          MemoryAccess *Usr = dyn_cast<MemoryAccess>(U.getUser());
+          if (!Usr)
----------------
s/dyn_cast/cast/.

If a non-MemoryAccess is using a MemoryAccess, we have problems. :)

================
Comment at: lib/Analysis/MemorySSAUpdater.cpp:852
+            if (!DT.dominates(DominatingBlock, DominatedBlock)) {
+              for (auto *BB :
+                   llvm::concat<BasicBlock *>(BlocksToProcess, IDFBlocks))
----------------
The `O(Blocks.size() * AverageBlockMemoryAccessesCount * AverageMemoryAccessUserCount * (Blocks.size() + IDF.size()))` (assuming `dominates` is constant time) behavior here is somewhat concerning to me, though idk if that's an unfounded concern.

If the user is a Def, it's probably best to just deoptimize it; we'll just start from the Def's defining access in the walker anyway. (...Or, well, we would if we noticed that the optimization is now invalid. For reasons unclear to me looking into history + comments, `MemoryDef::isOptimized` checks the Def's *defining* access, rather than its optimized access. Which is almost definitely incorrect, at least with the aliasing bits we now cache. Working on that now.)

If it's not the Def's optimized access that we're dealing with, then the only new Access that can work is `MSSA->getMemoryAccess(U->getBlock());` (since the only way that can happen is if that Def is the first access in a BB, and we've just inserted that block into said BB).

If it's a Use, we'll try to reoptimize starting from its DefiningAccess, which will be whatever this Use points to. It's unclear what the right trade-off is there, since the more aggressive we are here, the less walking we *may* have to do in the future. Depending on the amount of luck involved in this loop's iteration order (e.g. if we remove the `break;`, is it possible for `U` to point to something different when the loop terminates? Is that value going to be better, worse, or are we uncertain? The feeling I get is "yes; uncertain," but I haven't fully reasoned through it,) it may be best to just say "point it at the phi of the Use's BB. If that doesn't exist, walk IDoms until we find the last Def."

(Edit: ...Did I never make MemoryDefs stop using WeakVHes to track what they're optimized to? OK. I'll work on that in parallel, too, since I imagine that's pretty necessary for this to work.)

Repository:
  rL LLVM

https://reviews.llvm.org/D45299