[PATCH] D40480: MemorySSA backed Dead Store Elimination.

Thu Nov 30 12:12:01 PST 2017

dmgreen added a comment.

Hello all. Thanks for taking a look. Much appreciated. The main thing this does over the old version is work across basic block, which is why we here are interested. In something like this:

  for (int i = ..)
      X[i] = 0;
      for (int j = ..)
          X[i] += Y[j];

The inner X[i] will currently be pulled out of the loop by LICM, but the original X[i] = 0 will remain as a dead store.

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1324
+      // Any MemoryUse's that may alias is a break. Otherwise ignore
+      if (AA.getModRefInfo(MA->getMemoryInst(), SILoc) & MRI_Ref)
+        return {NextResult::Bad, nullptr};
----------------
dberlin wrote:
> Thinking about this not too hard, my initial thought is that this should be unnecessary, IMHO. I'm curious what's happening if it's not.
> 
> Assuming we did not hit the optimization limits during memory ssa use building, any use points to a store that may-or -must aliases it.
> (if we hit the opt limits, you probably want to respect them anyway :P).
> 
> So this check should just be flat out unnecessary when current ==  original access.  If that has a memory use it's not dead on the path to the memoryuse (ignoring PDSE , that means it's not dead overall)
> 
> When walking from def to def, (IE when current != originalaccess),  that leaves two cases:
> The store it's a use for may-aliases your original access's siloc - your store is not dead
> the store it's a use for must-aliases your original access's siloc - your stores are the same (and so maybe one is dead)
> Note that in either case, what the use is for or aliases is irrelevant.  The only thing relevant to next access is the store aliasing.  The uses force your store alive or not based on what store they are linked to and it's equivalence to your original one.
> 
> Put another way: I can't see a reason to check  check explicitly whether the use aliases you.  You only have to check whether use->getDefiningAccess->memloc mayalias siloc.
> MemorySSA has already checked whether use mayalias def for you.
> 
> You can cache that, and now your walk is much faster because you only have to know things about the defs and not the uses.
>  
> Again, i could be completely wrong, but i'd like to understand why :)
> 
> It's true that aliasing is not transitive in theory, but in llvm, the vast majority of cases where it isn't have metadata attached and you could only check here in those cases if you are worried about catching every case.
> 
This is one of those things where I remember changing the "if" here to an assert, seeing it assert a lot and figuring it was then needed. A lot of those cases will be benign though. I took another look, and after wading through a few csmith cases where extra stores are removed with this check, something like the following is where this comes up:
```
define void @test(i32* %P, i32* noalias %Q, i32* %R) {
; 1 = MemoryDef(liveOnEntry)
  store i32 1, i32* %Q
; 2 = MemoryDef(1)
  store i32 2, i32* %P
; 3 = MemoryDef(2)
  store i32 3, i32* %Q
; MemoryUse(2)
  %1 = load i32, i32* %R
  ret void
}
```
store 3 to Q can be removed, but as MemAccess "2" has 2 operands, the second Use would cause us to fail as we walk from 2 to 3.

I'm not sure if this is worth the cost though. I don't have a great grasp of what will end up being expensive. I will try some benchmarks and try to get some compile time numbers to see, and if there are not any useful cases where this comes up, I'll remove it. My first go at getting compile time numbers was too noisy to be useful.

One thing I never looked into very much in creating this was looking into the internals of MemorySSA. MemSSA seemed to just work well enough to not need it. The only thing I remember coming up was cases like this:
```
define void @test16(i32* noalias %P) {
  %P2 = bitcast i32* %P to i8*
  store i32 1, i32* %P
  br i1 true, label %bb1, label %bb3
bb1:
  store i32 1, i32* %P
  br label %bb3
bb3:
  call void @free(i8* %P2)
  ret void
}
```
We remove the store 1 in the middle, leaving this:
```
define void @test16(i32* noalias %P) {
  %P2 = bitcast i32* %P to i8*
; 1 = MemoryDef(liveOnEntry)
  store i32 1, i32* %P
  br i1 true, label %bb1, label %bb3
bb1:                                              ; preds = %0
  br label %bb3
bb3:                                              ; preds = %bb1, %0
; 4 = MemoryPhi({%0,1},{bb1,1})
; 3 = MemoryDef(4)
  call void @free(i8* %P2)
  ret void
}
```
The memory phi remains with two operands both coming from "1". I presume this is OK, and it's easy enough to work around. It's different to a newly constructed memssa graph though.

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1469
+
+bool isThrowInRange(Value *SI, Value *NI, const Value *SILocUnd,
+                    DenseMap<Value *, unsigned> &InstructionPostOrders,
----------------
efriedma wrote:
> This is a confusing name, given what the function checks.
This is very sparse on comments, isn't it. My bad. I'll try and add some more in here. The important part of this function is to check that there are no throwing instructions between the SI and NI, except where SI is an Alloca/AllocLikeFn that do not escape.

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1651
+        if (MayThrow)
+          MayThrowsPostOrders.push_back(IPO);
+        if (MD && hasMemoryWrite(&I, TLI) && isRemovable(&I))
----------------
efriedma wrote:
> Is it possible for any DSE-related transform to invalidate this?
> 
> I'm not sure I really understand how you're using MayThrowsPostOrders here... more comments would be helpful.
More comments, got it, will do. This is one of the good ideas that came from D29624 (any bugs are still on me, of course :)

Nothing should invalidate the post orders. It's the relative order that we care about and dse only removes instructions, never re-orders them. So if there is a throw between two PO numbers the throw will remain in it's order when we remove one of the stores.

I was just thinking more about this and it may end up giving odd results at times. This linearising of the PO numbers is dependant on the order the blocks are visited. The store 1 here is removed:
```
define void @test(i32* noalias %P) {
    br i1 true, label %bb1, label %bb2
bb1:
    store i32 1, i32* %P
    br label %bb3
bb2:
    call void @unknown_func()
    br label %bb3
bb3:
    store i32 0, i32* %P
    ret void
}
```
But changing it to "br i1 true, label %bb2, label %bb1", the store wont be removed. That doesn't sound like something we want. Maybe this should be keeping the maythrown instructions around and checking they are really in the way. I'll look into this, but may need to read a book.

================
Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1689
+    // Limit on the number of memory instruction we can look at
+    int InstructionLimit = MemorySSAMemoryAccessScanLimit;
+    // Used by PartialEarlierWithFullLater to ensure that we were free of
----------------
dberlin wrote:
> FWIW: You can do better than this, but it's complicated.
> 
> We could build a form with factored stores and multiple phis/sigmas if it is really worth it.
> 
> You also are going to end up rediscovering the same data here again and again (IE for each store, because you are working backwards and  they are all linked, you will eventually hit the same use chain you just saw.  for a given memloc or anything that is mustalias of that memloc, the answers must be the same) . There are various hashing /pair/etc schemes you can use to cache answers.  Not sure it is worth it.
> 
> Especially vs building a real form for stores.
OK. As I said above I never looked deeply into the internals of MemSSA. Doing the optimisation of stores in the MemorySSA graph is what gcc does, right? So it only needs to look at a stores immediate uses to find dead stores. And look through PHIs.

My understanding is that it would only be DSE that needs this at current. For all the other users of MemSSA (gvn, licm, etc) only need optimised stores. So if we did it that way it should be an optional thing that DSE can then use. Do you think that way would be better? We will still have to walk through PHI's, but that's fairly trivial.

I will try and get some compile time results together and see how things look.

================
Comment at: test/Transforms/DeadStoreElimination/simple.ll:555
+; I think this case is currently handled incorrectly by memdeps dse
+; throwing should leave store i32 1, not remove from the free.
+declare void @free(i8* nocapture)
----------------
efriedma wrote:
> Yes, you're right; I guess I missed that case when I fixed the other noalias issues in DSE.
Thanks for checking. That pesky handleEnd function I feel was doing more than it should have.

https://reviews.llvm.org/D40480