[llvm-dev] [MemorySSA] inserting or removing memory instructions

Sun Feb 19 06:05:22 PST 2017

2017-02-19 2:07 GMT+01:00 Daniel Berlin <dberlin at dberlin.org>:

>
>
> On Sat, Feb 18, 2017 at 4:05 PM, Piotr Padlewski <
> piotr.padlewski at gmail.com> wrote:
>
>> Thanks for the reply. I still don't understand a couple of things.
>> Let's take a look at MemorySSATest.MoveAStore test.
>>
>> This test doesn't use the updater?
>
> This is the destructive test of trying to update it without the updater.
>
>
>> The dump of MSSA looks like after each operation with my comments looks
>> like below. I also added question after each transformation:
>>
>> ; Just after constructing
>> define void @F(i8*) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   store i8 16, i8* %0
>>   br i1 true, label %2, label %3
>>
>> ; <label>:2:                                      ; preds = %1
>> ; 2 = MemoryDef(1)
>>   store i8 16, i8* %0
>>   br label %4
>>
>> ; <label>:3:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:4:                                      ; preds = %3, %2
>> ; 3 = MemoryPhi({%2,2},{%3,1})
>> ; MemoryUse(3)
>>   %5 = load i8, i8* %0
>> }
>> Everything is OK.
>>
>> ; Hoisting mem def 2.
>> define void @F(i8*) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   store i8 16, i8* %0
>> ; 2 = MemoryDef(1)
>>   store i8 16, i8* %0
>>   br i1 true, label %2, label %3
>>
>> ; <label>:2:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:3:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:4:                                      ; preds = %3, %2
>> ; 3 = MemoryPhi({%2,2},{%3,1})
>> ; MemoryUse(3)
>>   %5 = load i8, i8* %0
>> }
>> Why phi is still using 1 = MemoryDef?
>>
>
> This is not an updater test.
> This literally just moved it using the normal IR moving.
> As you can see, it generates a broken thing in the middle state.
>
>
>
>> I would expect the phi to be transformed to
>> 3 = MemoryPhi({%2,2},{%3,2})
>> or even removed.
>>
>
>
>
>
>>
>> ; After calling MSSA.createMemoryAccessAfter(SideStore,
>> EntryStoreAccess, EntryStoreAccess);
>> define void @F(i8*) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   store i8 16, i8* %0
>> ; 4 = MemoryDef(1)
>>   store i8 16, i8* %0
>>   br i1 true, label %2, label %3
>>
>> ; <label>:2:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:3:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:4:                                      ; preds = %3, %2
>> ; 3 = MemoryPhi({%2,2},{%3,1})
>> ; MemoryUse(3)
>>   %5 = load i8, i8* %0
>> }
>> If createMemoryAccessAfter didn't remove 2 = MemoryDef, then where it is?
>>
> This is why it says:
> "  /// Note: If a MemoryAccess already exists for I, this function will
> make it
>   /// inaccessible and it *must* have removeMemoryAccess called on it. "
>
> Is it still somewhere in MSSA, and dump just don't print it?
>>
>> ; After calling  EntryStoreAccess->replaceAllUsesWith(NewStoreAccess);
>> define void @F(i8*) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   store i8 16, i8* %0
>> ; 4 = MemoryDef(4)
>>   store i8 16, i8* %0
>>   br i1 true, label %2, label %3
>>
>> ; <label>:2:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:3:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:4:                                      ; preds = %3, %2
>> ; 3 = MemoryPhi({%2,2},{%3,4})
>> ; MemoryUse(3)
>>   %5 = load i8, i8* %0
>> }
>> 4 = MemoryDef(4) looks very suspisious..
>>
>
> And today we discover updating is hard without the updater ;)
>
> Yea, that's what I thought.

> Yes, it needs to reset the defining access after, but the destructive test
> is not doing that.
>
>
>>
>> ; After calling MSSA.removeMemoryAccess(SideStoreAccess);
>> define void @F(i8*) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   store i8 16, i8* %0
>> ; 4 = MemoryDef(4)
>>   store i8 16, i8* %0
>>   br i1 true, label %2, label %3
>>
>> ; <label>:2:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:3:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:4:                                      ; preds = %3, %2
>> ; 3 = MemoryPhi({%2,4},{%3,4})
>> ; MemoryUse(3)
>>   %5 = load i8, i8* %0
>> }
>>
>>
>> There is also very similar test - MoveAStoreUpdater and
>> MoveAStoreUpdaterMove, that uses updater - instead
>> produce exactly what I would expect at the end:
>>
>> define void @F(i8*) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   store i8 16, i8* %0
>> ; 4 = MemoryDef(4)
>>   store i8 16, i8* %0
>>   br i1 true, label %2, label %3
>>
>> ; <label>:2:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:3:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:4:                                      ; preds = %3, %2
>> ; 3 = MemoryPhi({%2,4},{%3,4})
>> ; MemoryUse(3)
>>   %5 = load i8, i8* %0
>> }
>>
>> This seems a little counter-intuitive that there exist another class that
>> is doing updates better than MemorySSA, and that the functions from
>> memoryssa are public (which resulted in my confusion on what
>> the methods are doing).
>>
>
> First, I was asked to split the updater into it's own class/file, so i did
> :)
>  We could add getUpdater to MemorySSA, and move all the functions there.
> We probably should
>
> ATM we have to have createMemory* available because the updater works on
> memory accesses, not instructions.
> We could make it work on instructions but it's a bit trickier.
> The main reason i did not start that way was because most things know
> where they are putting the new memory accesses, but we do not.
> So, we'd have to locate the nearest memory access to the new instruction
>  in the block (IE a worst case O(n) walk in either direction, but mssa
> lookups) just to know where it's been placed.
>
> The real intent is that people create the  memory access and then just
> hand it to the updater.
>
> We could speed up the walk by turning what is currently the
> createMemoryAccess* into hints to the updater to know where to start the
> walk/where to go.
>
>
>> Is it because for some cases, the updating API that mssa provides is
>> sufficient (and faster?)?
>>
>
> The removal case is the only sane case anymore,IMHO.
>
> I think we should move the functions to the updater to start, and then
> slowly deprecate them.
>
> Do you have some plans or ideas how we could make it easier to use?
>>
>
> It's just the old way is too hard for people to use, and the new way is
> not fully adopted yet.
>
>
>>
>> The other topic is that I am trying to come up with the updater for
>> instruction using !invariant.group, and I have no idea how much the
>> updating in MSSA should do and how much MemorySSAUpdater should fix.
>>
>
> I'm not sure why invariant.group requires anything from MemorySSA
> updating, maybe you can explain?
> The neither the updater or removal promise that uses will be optimized.
> In fact, it resets optimized on uses.
>
> Ok, that seems to be easier to do in the walker than in updater.

> The walker will properly fix up and reoptimize uses where they need it.
>
>
> There is also another problem that I am not sure if I should fix, because
>> I don't see O(1) solution for it.
>>
>> For code:
>>
>> store 42, %p, !invariant.group !0
>> store 42, %p, !invariant.group !0
>> store 42, %p, !invariant.group !0
>> store 42, %p, !invariant.group !0
>> store 42, %p, !invariant.group !0
>>
>> load %p, !invarian.group !0
>> ; The same maybe in different blocks
>> load %p, !invarian.group !0
>> load %p, !invarian.group !0
>>
>> I would expect to have all loads uses the first store.
>>
> They will, assuming the optimizer did not hit any limits.
>
In the algorithm that I posted here https://reviews.llvm.org/D29064
optimizing every store/load with invariant.group is O(1), so there
shouldn't be problems with limits.

>
>
>> Then if I would start removing stores starting from the top I have to
>> update all the uses all
>> over again,
>>
>
> This is why you would not remove them top down. Stores are removed in
> post-dom order, or you are insane :)
>
> Yes, if you do this top-down, instead of bottom up, it's O(Instructions *
> Uses).
>
> That's actually true of the opposite order for non-memory as well:
> %1 = add 1, 1
> %2 = add 1, 1
> %3 = add  1, 1
> %4 = add  1, 1
> %5 = add  1, 1
>
> use (%1)
> use (%1)
> use (%1)
> use (%1)
> use (%1)
> use (%1)
> use (%1)
>
> If you are eliminating top-down using later-accesses  to say that earlier
> are dead (as you are suggesting):
>
> You would do
> %1->replaceAllUses(%2)
> %1->erasefromParent()
> %2->replaceAllUses(%3)
> ...
>
> It has the same time bounds.
>
>
>
>> which is O(#uses). It is probably something that could be solved with the
>> disjoint set data structure, but it gets more complicated if I can't
>> replace all uses with another def
>> (because they diverge), so I would have to find the first memoryDef that
>> dominates all the invariant.group uses in given path, or use the closest
>> non-invariant.group def if it doesn't exist, like here:
>>
>
>
>
>
>>
>> define void @hard(i8* %p) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   store i8 42, i8* %p, !invariant.group !0
>> ; 2 = MemoryDef(1)
>>   call void @clobber8(i8* %p)
>>   br i1 undef, label %b1, label %b2
>>
>> b1:                                               ; preds = %0
>> ; 3 = MemoryDef(2)
>>   store i8 42, i8* %p, !invariant.group !0
>> ; MemoryUse(1)
>>   %1 = load i8, i8* %p, !invariant.group !0
>>   br label %b3
>>
>> b2:                                               ; preds = %0
>> ; MemoryUse(1)
>>   %2 = load i8, i8* %p, !invariant.group !0
>>   br label %b3
>>
>> b3:                                               ; preds = %b2, %b1
>> ; 5 = MemoryPhi({b1,3},{b2,2})
>> ; MemoryUse(5)
>>   %3 = load i8, i8* %p, !invariant.group !0
>> ; 4 = MemoryDef(5)
>>   store i8 42, i8* %p, !invariant.group !0
>> ; MemoryUse(4 )
>>   %4 = load i8, i8* %p, !invariant.group !0
>>   ret void
>> }
>>
>> After removing the first instruction I would expect:
>>
>
> Nope.
> As mentioned, neither removal, nor the updater,  guarantees optimized uses.
> The walker will reoptimize them and reset defining access as necessary.
>
> Instead, all that will happen in your example, if you remove store 2, is
> that we will produce a conservatively correct answer:
> We will repoint all uses that pointed at 2 at 2's defining access.
>
> This is guaranteed to be a thing that dominates 2 (by the SSA property).
> It's also guaranteed not to be wrong.  Whatever uses reached 2 could not
> have aliased with anything in the way.
> So the next thing it could possibly be aliased to is 2's defining access.
>
I thinked about it a little bit more, and it is exactly as you said. I was
worried how mssa will look after the removal of 1  = MemoryDef in example:

define void @hard(i8* %p) {
; 2 = MemoryDef(liveOnEntry)
  call void @clobber8(i8* %p)
  br i1 undef, label %b1, label %b2

b1:                                               ; preds = %0
; 3 = MemoryDef(2)
  store i8 42, i8* %p, !invariant.group !0
; MemoryUse(liveOnEntry)
  %1 = load i8, i8* %p, !invariant.group !0
  br label %b3

b2:                                               ; preds = %0
; MemoryUse(liveOnEntry)
  %2 = load i8, i8* %p, !invariant.group !0
  br label %b3

b3:                                               ; preds = %b2, %b1
; 5 = MemoryPhi({b1,3},{b2,2})
; MemoryUse(5)
  %3 = load i8, i8* %p, !invariant.group !0
; 4 = MemoryDef(5)
  store i8 42, i8* %p, !invariant.group !0
; MemoryUse(4)
  %4 = load i8, i8* %p, !invariant.group !0
  ret void
}

It looked wrong to me that now the loads that were defined by the removed
store are now liveOnEntry, but one could remove the store only if it could
prove
that it is redundant (or not used), meaning that the loads are really
defined by liveOnEntry (because every caller would have to set up value of
%p to 42.
Brilliant!

> Thus, we will produce
>
>>
>> define void @hard(i8* %p) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   call void @clobber8(i8* %p)
>>   br i1 undef, label %b1, label %b2
>>
>> b1:                                               ; preds = %0
>> ; 3 = MemoryDef(1)
>>
>
> Note you had reset the memorydef id of this to 2, which will never happen
> :)
>
>
>>   store i8 42, i8* %p, !invariant.group !0
>> ; MemoryUse(1)
>>   %1 = load i8, i8* %p, !invariant.group !0
>>   br label %b3
>>
>> b2:                                               ; preds = %0
>> ; MemoryUse(1)
>>   %2 = load i8, i8* %p, !invariant.group !0
>>   br label %b3
>>
>> b3:                                               ; preds = %b2, %b1
>> ; 4 = MemoryPhi({b1,3},{b2,1})
>> ; MemoryUse(5)
>>   %3 = load i8, i8* %p, !invariant.group !0
>> ; 3 = MemoryDef(4)
>>   store i8 42, i8* %p, !invariant.group !0
>> ; MemoryUse(4)
>>   %4 = load i8, i8* %p, !invariant.group !0
>>   ret void
>> }
>>
>> Which requires %1, %2, %3 and %4 be updated with different defs.
>>
>
>
> Again, we do not guarantee reoptimized uses at each intermediate point.
> Doing so would be N^3 worst case.
> Instead, we guarantee that when you call getClobberingMemoryAccess on a
> def or use, it gives you the most optimal answer.
> As a side effect, it resets the definingaccess of the use (mainly so it
> doesn't waste time on the same links next time)
>
> We used to guarantee the optimized property all the time in the underlying
> IR itself, but as you've discovered, it is hard invariant to maintain.
>
> This does mean that *uses* of a store may include false-aliases that need
> to be disambiguated if you haven't called getMemoryAccess
> But that's the price of admission ATM.
> We'd need MemorySSI to solve that well.
>
> What is MemorySSI?

> Updating it fast otherwise basically requires some interesting linking.
> IE what you really want, is, for each memory location or even each store,
> the stores to be linked in a way that looks like a dominator tree
>
> http://link.springer.com/chapter/10.1007/978-3-319-10936-7_13
>
> has a neat way to handle the storage and finding using quadtrees.
>
> But, Store optimizations are much less common than hoist ones for LLVM.
> It's easy enough to teach those things to remove in the right order
>
>
> Oh right, so I guess the updater doesn't require any special handling for
invariant groups, but I need to udpate the walker.
Thanks for long reply, it is really helpfull.

Piotr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170219/30c31bbc/attachment.html>