[PATCH] D40480: MemorySSA backed Dead Store Elimination.

Thu Nov 30 12:39:00 PST 2017

On Thu, Nov 30, 2017 at 12:12 PM, Dave Green via Phabricator <
reviews at reviews.llvm.org> wrote:

> dmgreen added a comment.
>
> Hello all. Thanks for taking a look. Much appreciated. The main thing this
> does over the old version is work across basic block, which is why we here
> are interested. In something like this:
>
>   for (int i = ..)
>       X[i] = 0;
>       for (int j = ..)
>           X[i] += Y[j];
>
> The inner X[i] will currently be pulled out of the loop by LICM, but the
> original X[i] = 0 will remain as a dead store.
>
>
>
> ================
> Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1324
> +      // Any MemoryUse's that may alias is a break. Otherwise ignore
> +      if (AA.getModRefInfo(MA->getMemoryInst(), SILoc) & MRI_Ref)
> +        return {NextResult::Bad, nullptr};
> ----------------
> dberlin wrote:
> > Thinking about this not too hard, my initial thought is that this should
> be unnecessary, IMHO. I'm curious what's happening if it's not.
> >
> > Assuming we did not hit the optimization limits during memory ssa use
> building, any use points to a store that may-or -must aliases it.
> > (if we hit the opt limits, you probably want to respect them anyway :P).
> >
> > So this check should just be flat out unnecessary when current ==
> original access.  If that has a memory use it's not dead on the path to the
> memoryuse (ignoring PDSE , that means it's not dead overall)
> >
> > When walking from def to def, (IE when current != originalaccess),  that
> leaves two cases:
> > The store it's a use for may-aliases your original access's siloc - your
> store is not dead
> > the store it's a use for must-aliases your original access's siloc -
> your stores are the same (and so maybe one is dead)
> > Note that in either case, what the use is for or aliases is irrelevant.
> The only thing relevant to next access is the store aliasing.  The uses
> force your store alive or not based on what store they are linked to and
> it's equivalence to your original one.
> >
> > Put another way: I can't see a reason to check  check explicitly whether
> the use aliases you.  You only have to check whether
> use->getDefiningAccess->memloc mayalias siloc.
> > MemorySSA has already checked whether use mayalias def for you.
> >
> > You can cache that, and now your walk is much faster because you only
> have to know things about the defs and not the uses.
> >
> > Again, i could be completely wrong, but i'd like to understand why :)
> >
> > It's true that aliasing is not transitive in theory, but in llvm, the
> vast majority of cases where it isn't have metadata attached and you could
> only check here in those cases if you are worried about catching every case.
> >
> This is one of those things where I remember changing the "if" here to an
> assert, seeing it assert a lot and figuring it was then needed. A lot of
> those cases will be benign though. I took another look, and after wading
> through a few csmith cases where extra stores are removed with this check,
> something like the following is where this comes up:
> ```
> define void @test(i32* %P, i32* noalias %Q, i32* %R) {
> ; 1 = MemoryDef(liveOnEntry)
>   store i32 1, i32* %Q
> ; 2 = MemoryDef(1)
>   store i32 2, i32* %P
> ; 3 = MemoryDef(2)
>   store i32 3, i32* %Q
> ; MemoryUse(2)
>   %1 = load i32, i32* %R
>   ret void
> }
> ```
> store 3 to Q can be removed, but as MemAccess "2" has 2 operands, the
> second Use would cause us to fail as we walk from 2 to 3.
>

Why?
I'm not suggesting stopping when you see a use.
i'm suggesting treating a use's aliasing the same as the def it points to.

Here, you would walk from 1 to 2, say "1 noalias 2" keep going. You would
ignore the use at this point. because this use already has been taken care
of because it is may-alias of 2 and 1 noalias 2.
(You can only do this legally if use->isOptimized)

>
> I'm not sure if this is worth the cost though. I don't have a great grasp
> of what will end up being expensive. I will try some benchmarks and try to
> get some compile time numbers to see, and if there are not any useful cases
> where this comes up, I'll remove it.

I'm more concerned with people thinking this is necessary to get aliasing
info, because it shouldn't be at all :)
The whole point of having uses point at the stores that clobber them is to
avoid checking like the above.
So if we can't, i'd like to understand why.

> My first go at getting compile time numbers was too noisy to be useful.
>
> One thing I never looked into very much in creating this was looking into
> the internals of MemorySSA. MemSSA seemed to just work well enough to not
> need it. The only thing I remember coming up was cases like this:
> ```
> define void @test16(i32* noalias %P) {
>   %P2 = bitcast i32* %P to i8*
>   store i32 1, i32* %P
>   br i1 true, label %bb1, label %bb3
> bb1:
>   store i32 1, i32* %P
>   br label %bb3
> bb3:
>   call void @free(i8* %P2)
>   ret void
> }
> ```
> We remove the store 1 in the middle, leaving this:
> ```
> define void @test16(i32* noalias %P) {
>   %P2 = bitcast i32* %P to i8*
> ; 1 = MemoryDef(liveOnEntry)
>   store i32 1, i32* %P
>   br i1 true, label %bb1, label %bb3
> bb1:                                              ; preds = %0
>   br label %bb3
> bb3:                                              ; preds = %bb1, %0
> ; 4 = MemoryPhi({%0,1},{bb1,1})
> ; 3 = MemoryDef(4)
>   call void @free(i8* %P2)
>   ret void
> }
> ```
> The memory phi remains with two operands both coming from "1". I presume
> this is OK, and it's easy enough to work around. It's different to a newly
> constructed memssa graph though.
>
>
>
Yes, we don't try to eliminate all possibly dead existing phis, only
eliminate unneeded new phis when we place them.

It's generally expect it's not worth it (because it can be recursive).
Most things will see the forms as equivalent ;)

(Because any memoryuse after 4 will use 1 and not 4, only a memorydef after
4 will use 4, and asking for getClobberingAccess on it will still return 1
and not 4 in that case)

> ================
> Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1469
> +
> +bool isThrowInRange(Value *SI, Value *NI, const Value *SILocUnd,
> +                    DenseMap<Value *, unsigned> &InstructionPostOrders,
> ----------------
> efriedma wrote:
> > This is a confusing name, given what the function checks.
> This is very sparse on comments, isn't it. My bad. I'll try and add some
> more in here. The important part of this function is to check that there
> are no throwing instructions between the SI and NI, except where SI is an
> Alloca/AllocLikeFn that do not escape.
>
>
> ================
> Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1651
> +        if (MayThrow)
> +          MayThrowsPostOrders.push_back(IPO);
> +        if (MD && hasMemoryWrite(&I, TLI) && isRemovable(&I))
> ----------------
> efriedma wrote:
> > Is it possible for any DSE-related transform to invalidate this?
> >
> > I'm not sure I really understand how you're using MayThrowsPostOrders
> here... more comments would be helpful.
> More comments, got it, will do. This is one of the good ideas that came
> from D29624 (any bugs are still on me, of course :)
>
> Nothing should invalidate the post orders. It's the relative order that we
> care about and dse only removes instructions, never re-orders them. So if
> there is a throw between two PO numbers the throw will remain in it's order
> when we remove one of the stores.
>
> I was just thinking more about this and it may end up giving odd results
> at times. This linearising of the PO numbers is dependant on the order the
> blocks are visited. The store 1 here is removed:
> ```
> define void @test(i32* noalias %P) {
>     br i1 true, label %bb1, label %bb2
> bb1:
>     store i32 1, i32* %P
>     br label %bb3
> bb2:
>     call void @unknown_func()
>     br label %bb3
> bb3:
>     store i32 0, i32* %P
>     ret void
> }
> ```
> But changing it to "br i1 true, label %bb2, label %bb1", the store wont be
> removed. That doesn't sound like something we want. Maybe this should be
> keeping the maythrown instructions around and checking they are really in
> the way. I'll look into this, but may need to read a book.
>
>
> ================
> Comment at: lib/Transforms/Scalar/DeadStoreElimination.cpp:1689
> +    // Limit on the number of memory instruction we can look at
> +    int InstructionLimit = MemorySSAMemoryAccessScanLimit;
> +    // Used by PartialEarlierWithFullLater to ensure that we were free of
> ----------------
> dberlin wrote:
> > FWIW: You can do better than this, but it's complicated.
> >
> > We could build a form with factored stores and multiple phis/sigmas if
> it is really worth it.
> >
> > You also are going to end up rediscovering the same data here again and
> again (IE for each store, because you are working backwards and  they are
> all linked, you will eventually hit the same use chain you just saw.  for a
> given memloc or anything that is mustalias of that memloc, the answers must
> be the same) . There are various hashing /pair/etc schemes you can use to
> cache answers.  Not sure it is worth it.
> >
> > Especially vs building a real form for stores.
> OK. As I said above I never looked deeply into the internals of MemSSA.
> Doing the optimisation of stores in the MemorySSA graph is what gcc does,
> right?

Yes

> So it only needs to look at a stores immediate uses to find dead stores.
> And look through PHIs.

 GCC should do it identically here, the forms are now the same (IE they use
single phi forms last i looked).

I'm not suggesting you go build a form for optimizing stores.
But it's worth understanding the behavior here before we limit it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171130/1c65519e/attachment.html>