<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 30, 2017 at 12:12 PM, Dave Green via Phabricator <span dir="ltr"><<a href="mailto:reviews@reviews.llvm.org" target="_blank">reviews@reviews.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">dmgreen added a comment.<br>

<br>

Hello all. Thanks for taking a look. Much appreciated. The main thing this does over the old version is work across basic block, which is why we here are interested. In something like this:<br>

<br>

  for (int i = ..)<br>

      X[i] = 0;<br>

      for (int j = ..)<br>

          X[i] += Y[j];<br>

<br>

The inner X[i] will currently be pulled out of the loop by LICM, but the original X[i] = 0 will remain as a dead store.<br>

<span class=""><br>

<br>

<br>

================<br>

Comment at: lib/Transforms/Scalar/<wbr>DeadStoreElimination.cpp:1324<br>

+      // Any MemoryUse's that may alias is a break. Otherwise ignore<br>

+      if (AA.getModRefInfo(MA-><wbr>getMemoryInst(), SILoc) & MRI_Ref)<br>

+        return {NextResult::Bad, nullptr};<br>

----------------<br>

</span><span class="">dberlin wrote:<br>

> Thinking about this not too hard, my initial thought is that this should be unnecessary, IMHO. I'm curious what's happening if it's not.<br>

><br>

> Assuming we did not hit the optimization limits during memory ssa use building, any use points to a store that may-or -must aliases it.<br>

> (if we hit the opt limits, you probably want to respect them anyway :P).<br>

><br>

> So this check should just be flat out unnecessary when current ==  original access.  If that has a memory use it's not dead on the path to the memoryuse (ignoring PDSE , that means it's not dead overall)<br>

><br>

> When walking from def to def, (IE when current != originalaccess),  that leaves two cases:<br>

> The store it's a use for may-aliases your original access's siloc - your store is not dead<br>

> the store it's a use for must-aliases your original access's siloc - your stores are the same (and so maybe one is dead)<br>

> Note that in either case, what the use is for or aliases is irrelevant.  The only thing relevant to next access is the store aliasing.  The uses force your store alive or not based on what store they are linked to and it's equivalence to your original one.<br>

><br>

> Put another way: I can't see a reason to check  check explicitly whether the use aliases you.  You only have to check whether use->getDefiningAccess->memloc mayalias siloc.<br>

> MemorySSA has already checked whether use mayalias def for you.<br>

><br>

> You can cache that, and now your walk is much faster because you only have to know things about the defs and not the uses.<br>

><br>

> Again, i could be completely wrong, but i'd like to understand why :)<br>

><br>

> It's true that aliasing is not transitive in theory, but in llvm, the vast majority of cases where it isn't have metadata attached and you could only check here in those cases if you are worried about catching every case.<br>

><br>

</span>This is one of those things where I remember changing the "if" here to an assert, seeing it assert a lot and figuring it was then needed. A lot of those cases will be benign though. I took another look, and after wading through a few csmith cases where extra stores are removed with this check, something like the following is where this comes up:<br>

```<br>

define void @test(i32* %P, i32* noalias %Q, i32* %R) {<br>

; 1 = MemoryDef(liveOnEntry)<br>

  store i32 1, i32* %Q<br>

; 2 = MemoryDef(1)<br>

  store i32 2, i32* %P<br>

; 3 = MemoryDef(2)<br>

  store i32 3, i32* %Q<br>

; MemoryUse(2)<br>

  %1 = load i32, i32* %R<br>

  ret void<br>

}<br>

```<br>

store 3 to Q can be removed, but as MemAccess "2" has 2 operands, the second Use would cause us to fail as we walk from 2 to 3.<br></blockquote><div><br></div><div>Why?</div><div>I'm not suggesting stopping when you see a use.</div><div>i'm suggesting treating a use's aliasing the same as the def it points to.</div><div><br></div><div>Here, you would walk from 1 to 2, say "1 noalias 2" keep going. You would ignore the use at this point. because this use already has been taken care of because it is may-alias of 2 and 1 noalias 2.</div><div>(You can only do this legally if use->isOptimized)</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

I'm not sure if this is worth the cost though. I don't have a great grasp of what will end up being expensive. I will try some benchmarks and try to get some compile time numbers to see, and if there are not any useful cases where this comes up, I'll remove it.</blockquote><div><br></div><div>I'm more concerned with people thinking this is necessary to get aliasing info, because it shouldn't be at all :)</div><div>The whole point of having uses point at the stores that clobber them is to avoid checking like the above.</div><div>So if we can't, i'd like to understand why.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> My first go at getting compile time numbers was too noisy to be useful.<br>

<br>

One thing I never looked into very much in creating this was looking into the internals of MemorySSA. MemSSA seemed to just work well enough to not need it. The only thing I remember coming up was cases like this:<br>

```<br>

define void @test16(i32* noalias %P) {<br>

  %P2 = bitcast i32* %P to i8*<br>

  store i32 1, i32* %P<br>

  br i1 true, label %bb1, label %bb3<br>

bb1:<br>

  store i32 1, i32* %P<br>

  br label %bb3<br>

bb3:<br>

  call void @free(i8* %P2)<br>

  ret void<br>

}<br>

```<br>

We remove the store 1 in the middle, leaving this:<br>

```<br>

define void @test16(i32* noalias %P) {<br>

  %P2 = bitcast i32* %P to i8*<br>

; 1 = MemoryDef(liveOnEntry)<br>

  store i32 1, i32* %P<br>

  br i1 true, label %bb1, label %bb3<br>

bb1:                                              ; preds = %0<br>

  br label %bb3<br>

bb3:                                              ; preds = %bb1, %0<br>

; 4 = MemoryPhi({%0,1},{bb1,1})<br>

; 3 = MemoryDef(4)<br>

  call void @free(i8* %P2)<br>

  ret void<br>

}<br>

```<br>

The memory phi remains with two operands both coming from "1". I presume this is OK, and it's easy enough to work around. It's different to a newly constructed memssa graph though.<br>

<span class=""><br>

<br></span></blockquote><div><br></div><div>Yes, we don't try to eliminate all possibly dead existing phis, only eliminate unneeded new phis when we place them.</div><div><br></div><div>It's generally expect it's not worth it (because it can be recursive).  Most things will see the forms as equivalent ;)</div><div><br></div><div>(Because any memoryuse after 4 will use 1 and not 4, only a memorydef after 4 will use 4, and asking for getClobberingAccess on it will still return 1 and not 4 in that case)</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

================<br>

Comment at: lib/Transforms/Scalar/<wbr>DeadStoreElimination.cpp:1469<br>

+<br>

+bool isThrowInRange(Value *SI, Value *NI, const Value *SILocUnd,<br>

+                    DenseMap<Value *, unsigned> &InstructionPostOrders,<br>

----------------<br>

</span><span class="">efriedma wrote:<br>

> This is a confusing name, given what the function checks.<br>

</span>This is very sparse on comments, isn't it. My bad. I'll try and add some more in here. The important part of this function is to check that there are no throwing instructions between the SI and NI, except where SI is an Alloca/AllocLikeFn that do not escape.<br>

<span class=""><br>

<br>

================<br>

Comment at: lib/Transforms/Scalar/<wbr>DeadStoreElimination.cpp:1651<br>

+        if (MayThrow)<br>

+          MayThrowsPostOrders.push_back(<wbr>IPO);<br>

+        if (MD && hasMemoryWrite(&I, TLI) && isRemovable(&I))<br>

----------------<br>

</span><span class="">efriedma wrote:<br>

> Is it possible for any DSE-related transform to invalidate this?<br>

><br>

> I'm not sure I really understand how you're using MayThrowsPostOrders here... more comments would be helpful.<br>

</span>More comments, got it, will do. This is one of the good ideas that came from D29624 (any bugs are still on me, of course :)<br>

<br>

Nothing should invalidate the post orders. It's the relative order that we care about and dse only removes instructions, never re-orders them. So if there is a throw between two PO numbers the throw will remain in it's order when we remove one of the stores.<br>

<br>

I was just thinking more about this and it may end up giving odd results at times. This linearising of the PO numbers is dependant on the order the blocks are visited. The store 1 here is removed:<br>

```<br>

define void @test(i32* noalias %P) {<br>

    br i1 true, label %bb1, label %bb2<br>

bb1:<br>

    store i32 1, i32* %P<br>

    br label %bb3<br>

bb2:<br>

    call void @unknown_func()<br>

    br label %bb3<br>

bb3:<br>

    store i32 0, i32* %P<br>

    ret void<br>

}<br>

```<br>

But changing it to "br i1 true, label %bb2, label %bb1", the store wont be removed. That doesn't sound like something we want. Maybe this should be keeping the maythrown instructions around and checking they are really in the way. I'll look into this, but may need to read a book.<br>

<span class=""><br>

<br>

================<br>

Comment at: lib/Transforms/Scalar/<wbr>DeadStoreElimination.cpp:1689<br>

+    // Limit on the number of memory instruction we can look at<br>

+    int InstructionLimit = MemorySSAMemoryAccessScanLimit<wbr>;<br>

+    // Used by PartialEarlierWithFullLater to ensure that we were free of<br>

----------------<br>

</span><span class="">dberlin wrote:<br>

> FWIW: You can do better than this, but it's complicated.<br>

><br>

> We could build a form with factored stores and multiple phis/sigmas if it is really worth it.<br>

><br>

> You also are going to end up rediscovering the same data here again and again (IE for each store, because you are working backwards and  they are all linked, you will eventually hit the same use chain you just saw.  for a given memloc or anything that is mustalias of that memloc, the answers must be the same) . There are various hashing /pair/etc schemes you can use to cache answers.  Not sure it is worth it.<br>

><br>

> Especially vs building a real form for stores.<br>

</span>OK. As I said above I never looked deeply into the internals of MemSSA. Doing the optimisation of stores in the MemorySSA graph is what gcc does, right?</blockquote><div><br></div><div>Yes</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> So it only needs to look at a stores immediate uses to find dead stores. And look through PHIs.</blockquote><div><br></div><div> GCC should do it identically here, the forms are now the same (IE they use single phi forms last i looked).</div><div><br></div><div>I'm not suggesting you go build a form for optimizing stores.</div><div>But it's worth understanding the behavior here before we limit it.</div><div><br></div></div></div></div>