[PATCH] D78932: [DSE,MSSA] Relax post-dom restriction for objs visible after return.

Thu May 7 11:57:08 PDT 2020

fhahn marked an inline comment as done.
fhahn added a comment.

In D78932#2024077 <https://reviews.llvm.org/D78932#2024077>, @george.burgess.iv wrote:

> Thanks for the patch :)

Thank you very much for taking the time to take a look :)

> I haven't reasoned through this fully, but the overall idea of "mark all killing blocks, then walk from the exits and see that all of them hit any of the killing blocks (or never pass through the block with the store)" seems functionally correct to me. I have a first round of comments/questions; happy to dig deeper on whatever direction we decide to head in.
> 
>> do you think this patch is along the lines you suggested
> 
> What I envisioned was more or less keeping a `SmallPtrSet<MemoryAccess *, N>` populated with potentially-terminating Defs/Phis (e.g., for each Def/Phi in there, there exists a path in the function such that that's the last Def/Phi before the function exits). When walking for memory which is readable after the function exits, if we would call `PushMemUses` on something in this set, we give up. This can set can be built once lazily, and then queried for the rest of the life of `run()`. We could build it by {B,D}FS'ing starting at all exit blocks for our current `Function`, and terminating the search each time we discover any MemoryDefs/MemoryPhis in a block (we have special use-lists for these, so this is a constant-time query per block).

Oh that's convenient, I was not aware of that. That should indeed make it quite straigth-forward to build this set.

> If there aren't holes in it, I think that approach would be generally faster/simpler, but it also introduces the problem of cache invalidation. Cache invalidation makes many things harder, including writing tests + making test-cases. :)
> 
> Assuming the development practice we're trying to follow is still "build what's needed, get feedback, iterate," that you mentioned on an earlier review, I'm happy to chalk that up as a future optimization if you'd prefer this approach.

IIUC, this approach is mostly the same as in the original patch, just with a different way of detecting 'last' MemoryDefs, right? I think there potentially are cases that would be handled more efficiently with the current approach (e.g. if there are a large number of memoryDefs/uses to traverse). If my understanding is correct, the alternative approach should be relatively straight forward to implement. I would be happy to iterate on that once we have a correct version in tree (that makes balancing the patches and benchmarking a bit easier I think), but I could also try to do it up-front, if that's preferred.

>> if the post order number of a block is greater than the one
>>  of DomAccess, the block cannot be in in a path starting at DomAccess
> 
> I'm not certain I'm interpreting "a path starting at," correctly, but the way I'm reading it, I don't agree:

Sorry, the statement is a bit incomplete I think. It meant to refer to 'all paths to any exit block, starting at DomAccess'. In case you described, there technically is a path through DomAccess, but that should be fine, as we already check all paths from DomAccess to any exit.

> 
> 
>     A
>     |
>     B<-----
>    / \    |
>   C   D   |
>       |   |
>       E --|
> 
> 
> (where all edges are downward except E -> B)
> 
> C = 1, E = 2, D = 3, B = 4, and A = 5, yet there's a path from E -> B, no? Admittedly, I'm unsure at this point if this is pedantry on my part, or if there are correctness implications on the algorithm as presented.

================
Comment at: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp:1659

-      // Check if we can skip DomDef for DSE. For accesses to objects that are
-      // accessible after the function returns, KillingDef must execute whenever
-      // DomDef executes and use post-dominance to ensure that.
+      // Check if we can skip DomDef for DSE. Accesses to objects that are
       MemoryDef *DomDef = dyn_cast<MemoryDef>(DomAccess);
----------------
george.burgess.iv wrote:
> > Accesses to objects that are
> 
> was this accidentally left in?
Yes, removed!

================
Comment at: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp:1748
+          if (UseAccess != DomAccess && CC &&
+              isOverwrite(DefLoc, *CC, DL, TLI, DepWriteOffset, InstWriteOffset,
+                          UseInst, IOL, AA, &F) == OW_Complete) {
----------------
george.burgess.iv wrote:
> it's a bit odd to read 
> 
> ```
> if (isCompleteOverwrite(...)) {
>   if (isOverwrite(...) == OW_Complete) {
>     // ...
>   }
> }
> ```
> 
> Is there a subtle difference between the two that I'm missing?
Ah right, I've refactored isCompleteOverwrite to just use isOVerwrite().

================
Comment at: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp:1802
+        else
+          for (BasicBlock *R : PDT.getRoots()) {
+            if (!DT.isReachableFromEntry(R))
----------------
george.burgess.iv wrote:
> i'm unsure how we reach this case. AFAICT, `PDT.dominates(nullptr, AnythingElse) == false`, since the PDT should fail to lookup the node for `nullptr`, no?
We can hit this scenario when there's no single exit block that post-dominates all killing blocks, like in the example below. Instead the special 'nullptr'
block is used, but for starting the traversal we need to start with all exit blocks I think.

```
define void @test12(i32* %P) {
  store i32 0, i32* %P
  br i1 true, label %bb1, label %bb2

bb1:                                              ; preds = %0
  store i32 1, i32* %P
  br label %bb3

bb2:                                              ; preds = %0
  store i32 1, i32* %P
  ret void

bb3:                                              ; preds = %bb1
  ret void
}
```

================
Comment at: llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp:1825
+
+          if (WorkList.size() >= 50)
+            return None;
----------------
george.burgess.iv wrote:
> nit: 50 is a bit of a magical number to give up at here. is it worth making a `-mllvm` option (or perhaps hoisting to a constant with a bit of prose about how it was chosen)?
I've made it into an option, thanks!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78932/new/

https://reviews.llvm.org/D78932