[PATCH] D73763: [DSE] Lift post-dominance restriction.

Wed Apr 1 19:04:52 PDT 2020

george.burgess.iv added a comment.

> I think there are 2 cases to distinguish:
> 
> 1. For accesses to non-alloca objects, this matches what the intrinsic achieves I think. We can only eliminate stores to objects that are visible after the function returns, if they are overwritten along all paths to the exit. So if we have determined a set of blocks that overwrite A (and there are no reads in between), we could check if all paths to the exit from A must go through one of the overwriting blocks. I think that matches your suggestion.

Sounds like what I was thinking, yeah.

> For objects that are not visible to the caller (e.g. alloca), the intrinsic achieves something slightly different: Given A and B as above, we can eliminate A, if there are no reads of A between A and any of the exits from A. We can stop walking any paths that contain overwrites of A.

I'm confused about why `alloca`s need special treatment. This new set that we're building is essentially a fast way to answer the question "are there potentially unknowable reads after this `Def`/`Phi`?" If we know memory (e.g., from an `alloca`) is unreadable outside of the function, I'd hope we can ignore the set entirely, like this patch is doing now? Baked into my suggestion was the assumption that we were already:

> check[ing] all potentially aliasing access along all paths to the exit

...since I think the transitive `User` walking we're doing inside of `getDomMemoryDef` should accomplish that?

In any case, if you're confident that this works for #1, I'm content to move discussion about "why don't things work as they are now," to a patch that doesn't block this one.

> The approach I tried to follow to bring up MemoySSA-backed DSE is to add support for additional cases in smallish steps to have a baseline (and compile-time is controlled by cut-offs) and subsequently improve the parts that case issues in practice. For example, lift the post-dominance restriction with potentially excessive walking, followed by patches to implement the special handling for the non-alloca case as suggested and future performance improvements (e.g. there is plenty of potential for caching, like in D75025 <https://reviews.llvm.org/D75025>)? That would also allow to quantify the impact of improvements. What do you think?

I like incremental approaches when they're possible, so I'm happy with this. :)

================
Comment at: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp:1682
       for (Use &U : Acc->uses())
         WorkList.insert(cast<MemoryAccess>(U.getUser()));
     };
----------------
fhahn wrote:
> george.burgess.iv wrote:
> > tangential to this patch: should we be ignoring `U`s which are `MemoryDefs` whose only Use of `Acc` is as their `getOptimized()` operand? Without that, it's not immediately obvious to me how we aren't overly conservative in situations like:
> > 
> > ```
> > {
> >   // 1 = MemoryDef(LOE) -- this is DomAccess
> >   *a = 1;
> >   // 2 = MemoryDef(1)
> >   *b = 2;
> >   // 3 = MemoryDef(2) -- this is the obvious kill
> >   *a = 3;
> >   // 4 = MemoryDef(3) (optimized to 2)
> >   *c = 4;
> >   // MemoryUse(4) -- AA says MayAlias for `(d, a)`, so `isReadClobber()` should be `true` if we visit this?
> >   if (*d) {
> >     // ...
> >   }
> > }
> > ```
> Oh, the optimized access is also added to the uses of `2 = MemoryDef` here? Yes that would be overly conservative. Is there a way to force MemorySSA to be optimized before we run DSE, for testing? I'll put up a separate fix.
> Is there a way to force MemorySSA to be optimized before we run DSE, for testing?

I'm unsure offhand. @asbirlea would likely know if there is.

If there's not and you'd find it useful, a debugging-only pass that boils down to `for (MemoryAccess *MA : Fn) { MSSA->getWalker()->getClobberingMemoryAccess(MA, {}); }` should force that. We preoptimize `Use`s, but not `Def`s.

================
Comment at: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp:1492
+  // Set of the last MemoryDefs or MemoryPhis before a function exit, that is
+  // defs or phis without any users that are defs or phis.
+  SmallPtrSet<MemoryAccess *, 8> LastDefOrPhi;
----------------
> that is defs or phis without any users that are defs or phis.

This isn't quite complete; consider:

```
void bar(); // inaccessiblememonly
void foo(int *a) {
  // 1 = MemoryDef(LOE)
  *a = 1;
  if (bar())
    return;
  // 2 = MemoryDef(1)
  *a = 2;
}
```

Despite having `2` as a `User`, we'd still want `1` to be in this set, since depending on the way the `if` goes, it's the last Def/Phi before the function ends

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73763/new/

https://reviews.llvm.org/D73763