[PATCH] D73763: [DSE] Lift post-dominance restriction.

Tue Mar 31 15:27:33 PDT 2020

fhahn added a comment.

In D73763#1951432 <https://reviews.llvm.org/D73763#1951432>, @george.burgess.iv wrote:

> Thanks for working on this!
>
> +1 to the general idea of "adding a read-everything intrinsic doesn't smell great," and a further +1 for containing it to only this pass if we do end up needing it.

Regarding the intrinsic, I am not sure if it is really needed, I guess it would be enough to check if we reached the end of our walk for each path. I'll try that tomorrow.

> This is the first time I'm glancing at this pass, so I'm going to err a bit on the side of overcommunication in hopes of making misunderstandings on my side more obvious. :)

Thank you very much for taking a look!

> I gather that `getDomMemoryDef`, given two defs A (`Current`) and B (`KillingDef`):
> 
> - Determines some Def/Phi that dominates A, C.
> - Walks all of the `MemoryAccess`es that're transitively users of C, stopping at full overwrites of C + B.
> - If any of those `MemoryAccess`es is a potential read of `A`, gives up. Otherwise, returns `A`
> 
>   If we remove the postdom restriction, I'm uncertain about how this algo will scale. It used to be that all paths necessarily terminated at `B`, but now it looks like we'll potentially walk a ton more? e.g.,
> 
>   ``` bool branch(); // inaccessiblememonly
> 
>   { // 1 = MemoryDef(LOE) -- DomAccess *a = 1; if (branch()) { // 2 = MemoryDef(1) -- The thing we want to replace it with *a = 2; } // 3 = MemoryPhi(2, 1) // Every Phi|Def below here is now walked (?) } ```
> 
>   To be clear, I'm not saying this walking isn't necessary; just that it now happens AFAICT.

Yes the summary above is correct. Removing the post-dominance restriction requires us to explore potentially much further unfortunately.

The main property we need to check is the following: given MemoryDefs A and B, where A dominates B and B completely overwrites A. We can eliminate A, if there are no access that may read A before B overwrites it. Only considering defs where B post-dominates A restricts the paths we have to check to the paths 'between' A and B. Without requiring post-dominance we have to check the property along all paths from A to the exit. In particular, the new case this allows to eliminate is when A is not read on any paths that do not go through B.

> In any case, the thing I see this new intrinsic standing in for is an oracle that responds to "here's the set of blocks we've observed complete overwrites in. Is there any way for us to exit this function without going through one of those blocks, starting from `A->getBlock()`?" This is an expensive problem, but I think we can simplify it substantially, *if* my comment about the extra walking above is correct.

I think there are 2 cases to distinguish:

1. For accesses to non-alloca objects, this matches what the intrinsic achieves I think. We can only eliminate stores to objects that are visible after the function returns, if they are overwritten along all paths to the exit. So if we have determined a set of blocks that overwrite A (and there are no reads in between), we could check if all paths to the exit from A must go through one of the overwriting blocks. I think that matches your suggestion.
2. For objects that are not visible to the caller (e.g. alloca), the intrinsic achieves something slightly different: Given A and B as above, we can eliminate A, if there are no reads of A between A and any of the exits from A. We can stop walking any paths that contain overwrites of A.

I think 1. could be improved as you suggested. If we want to cover 2. I think we have to check all potentially aliasing access along all paths to the exit.

The approach I tried to follow to bring up MemoySSA-backed DSE is to add support for additional cases in smallish steps to have a baseline (and compile-time is controlled by cut-offs) and subsequently improve the parts that case issues in practice. For example, lift the post-dominance restriction with potentially excessive walking, followed by patches to implement the special handling for the non-alloca case as suggested and future performance improvements (e.g. there is plenty of potential for caching, like in D75025 <https://reviews.llvm.org/D75025>)? That would also allow to quantify the impact of improvements. What do you think?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73763/new/

https://reviews.llvm.org/D73763