[llvm-dev] [MemorySSA] inserting or removing memory instructions

Sat Feb 18 22:19:47 PST 2017

I believe https://reviews.llvm.org/D30154 does what you want with being
able to rename uses due to aliasing stores.
The createsloadandstoreupdater test now produces:
define void @F(i8*) {
; 1 = MemoryDef(liveOnEntry)
  store i8 16, i8* %0
; 4 = MemoryDef(1)
  store i8 16, i8* %0
  br i1 true, label %2, label %3

; <label>:2:                                      ; preds = %1
; 2 = MemoryDef(4)
  store i8 16, i8* %0
  br label %4

; <label>:3:                                      ; preds = %1
  br label %4

; <label>:4:                                      ; preds = %3, %2
; 3 = MemoryPhi({%3,4},{%2,2})
; MemoryUse(3)
  %5 = load i8, i8* %0
; MemoryUse(3)
  %6 = load i8, i8* %0
}

On Sat, Feb 18, 2017 at 5:07 PM, Daniel Berlin <dberlin at dberlin.org> wrote:

>
>
> On Sat, Feb 18, 2017 at 4:05 PM, Piotr Padlewski <
> piotr.padlewski at gmail.com> wrote:
>
>> Thanks for the reply. I still don't understand a couple of things.
>> Let's take a look at MemorySSATest.MoveAStore test.
>>
>> This test doesn't use the updater?
>
> This is the destructive test of trying to update it without the updater.
>
>
>> The dump of MSSA looks like after each operation with my comments looks
>> like below. I also added question after each transformation:
>>
>> ; Just after constructing
>> define void @F(i8*) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   store i8 16, i8* %0
>>   br i1 true, label %2, label %3
>>
>> ; <label>:2:                                      ; preds = %1
>> ; 2 = MemoryDef(1)
>>   store i8 16, i8* %0
>>   br label %4
>>
>> ; <label>:3:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:4:                                      ; preds = %3, %2
>> ; 3 = MemoryPhi({%2,2},{%3,1})
>> ; MemoryUse(3)
>>   %5 = load i8, i8* %0
>> }
>> Everything is OK.
>>
>> ; Hoisting mem def 2.
>> define void @F(i8*) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   store i8 16, i8* %0
>> ; 2 = MemoryDef(1)
>>   store i8 16, i8* %0
>>   br i1 true, label %2, label %3
>>
>> ; <label>:2:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:3:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:4:                                      ; preds = %3, %2
>> ; 3 = MemoryPhi({%2,2},{%3,1})
>> ; MemoryUse(3)
>>   %5 = load i8, i8* %0
>> }
>> Why phi is still using 1 = MemoryDef?
>>
>
> This is not an updater test.
> This literally just moved it using the normal IR moving.
> As you can see, it generates a broken thing in the middle state.
>
>
>
>> I would expect the phi to be transformed to
>> 3 = MemoryPhi({%2,2},{%3,2})
>> or even removed.
>>
>
>
>
>
>>
>> ; After calling MSSA.createMemoryAccessAfter(SideStore,
>> EntryStoreAccess, EntryStoreAccess);
>> define void @F(i8*) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   store i8 16, i8* %0
>> ; 4 = MemoryDef(1)
>>   store i8 16, i8* %0
>>   br i1 true, label %2, label %3
>>
>> ; <label>:2:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:3:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:4:                                      ; preds = %3, %2
>> ; 3 = MemoryPhi({%2,2},{%3,1})
>> ; MemoryUse(3)
>>   %5 = load i8, i8* %0
>> }
>> If createMemoryAccessAfter didn't remove 2 = MemoryDef, then where it is?
>>
> This is why it says:
> "  /// Note: If a MemoryAccess already exists for I, this function will
> make it
>   /// inaccessible and it *must* have removeMemoryAccess called on it. "
>
> Is it still somewhere in MSSA, and dump just don't print it?
>>
>> ; After calling  EntryStoreAccess->replaceAllUsesWith(NewStoreAccess);
>> define void @F(i8*) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   store i8 16, i8* %0
>> ; 4 = MemoryDef(4)
>>   store i8 16, i8* %0
>>   br i1 true, label %2, label %3
>>
>> ; <label>:2:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:3:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:4:                                      ; preds = %3, %2
>> ; 3 = MemoryPhi({%2,2},{%3,4})
>> ; MemoryUse(3)
>>   %5 = load i8, i8* %0
>> }
>> 4 = MemoryDef(4) looks very suspisious..
>>
>
> And today we discover updating is hard without the updater ;)
>
> Yes, it needs to reset the defining access after, but the destructive test
> is not doing that.
>
>
>>
>> ; After calling MSSA.removeMemoryAccess(SideStoreAccess);
>> define void @F(i8*) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   store i8 16, i8* %0
>> ; 4 = MemoryDef(4)
>>   store i8 16, i8* %0
>>   br i1 true, label %2, label %3
>>
>> ; <label>:2:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:3:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:4:                                      ; preds = %3, %2
>> ; 3 = MemoryPhi({%2,4},{%3,4})
>> ; MemoryUse(3)
>>   %5 = load i8, i8* %0
>> }
>>
>>
>> There is also very similar test - MoveAStoreUpdater and
>> MoveAStoreUpdaterMove, that uses updater - instead
>> produce exactly what I would expect at the end:
>>
>> define void @F(i8*) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   store i8 16, i8* %0
>> ; 4 = MemoryDef(4)
>>   store i8 16, i8* %0
>>   br i1 true, label %2, label %3
>>
>> ; <label>:2:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:3:                                      ; preds = %1
>>   br label %4
>>
>> ; <label>:4:                                      ; preds = %3, %2
>> ; 3 = MemoryPhi({%2,4},{%3,4})
>> ; MemoryUse(3)
>>   %5 = load i8, i8* %0
>> }
>>
>> This seems a little counter-intuitive that there exist another class that
>> is doing updates better than MemorySSA, and that the functions from
>> memoryssa are public (which resulted in my confusion on what
>> the methods are doing).
>>
>
> First, I was asked to split the updater into it's own class/file, so i did
> :)
>  We could add getUpdater to MemorySSA, and move all the functions there.
> We probably should
>
> ATM we have to have createMemory* available because the updater works on
> memory accesses, not instructions.
> We could make it work on instructions but it's a bit trickier.
> The main reason i did not start that way was because most things know
> where they are putting the new memory accesses, but we do not.
> So, we'd have to locate the nearest memory access to the new instruction
>  in the block (IE a worst case O(n) walk in either direction, but mssa
> lookups) just to know where it's been placed.
>
> The real intent is that people create the  memory access and then just
> hand it to the updater.
>
> We could speed up the walk by turning what is currently the
> createMemoryAccess* into hints to the updater to know where to start the
> walk/where to go.
>
>
>> Is it because for some cases, the updating API that mssa provides is
>> sufficient (and faster?)?
>>
>
> The removal case is the only sane case anymore,IMHO.
>
> I think we should move the functions to the updater to start, and then
> slowly deprecate them.
>
> Do you have some plans or ideas how we could make it easier to use?
>>
>
> It's just the old way is too hard for people to use, and the new way is
> not fully adopted yet.
>
>
>>
>> The other topic is that I am trying to come up with the updater for
>> instruction using !invariant.group, and I have no idea how much the
>> updating in MSSA should do and how much MemorySSAUpdater should fix.
>>
>
> I'm not sure why invariant.group requires anything from MemorySSA
> updating, maybe you can explain?
> The neither the updater or removal promise that uses will be optimized.
> In fact, it resets optimized on uses.
>
> The walker will properly fix up and reoptimize uses where they need it.
>
>
> There is also another problem that I am not sure if I should fix, because
>> I don't see O(1) solution for it.
>>
>> For code:
>>
>> store 42, %p, !invariant.group !0
>> store 42, %p, !invariant.group !0
>> store 42, %p, !invariant.group !0
>> store 42, %p, !invariant.group !0
>> store 42, %p, !invariant.group !0
>>
>> load %p, !invarian.group !0
>> ; The same maybe in different blocks
>> load %p, !invarian.group !0
>> load %p, !invarian.group !0
>>
>> I would expect to have all loads uses the first store.
>>
> They will, assuming the optimizer did not hit any limits.
>
>
>> Then if I would start removing stores starting from the top I have to
>> update all the uses all
>> over again,
>>
>
> This is why you would not remove them top down. Stores are removed in
> post-dom order, or you are insane :)
>
> Yes, if you do this top-down, instead of bottom up, it's O(Instructions *
> Uses).
>
> That's actually true of the opposite order for non-memory as well:
> %1 = add 1, 1
> %2 = add 1, 1
> %3 = add  1, 1
> %4 = add  1, 1
> %5 = add  1, 1
>
> use (%1)
> use (%1)
> use (%1)
> use (%1)
> use (%1)
> use (%1)
> use (%1)
>
> If you are eliminating top-down using later-accesses  to say that earlier
> are dead (as you are suggesting):
>
> You would do
> %1->replaceAllUses(%2)
> %1->erasefromParent()
> %2->replaceAllUses(%3)
> ...
>
> It has the same time bounds.
>
>
>
>> which is O(#uses). It is probably something that could be solved with the
>> disjoint set data structure, but it gets more complicated if I can't
>> replace all uses with another def
>> (because they diverge), so I would have to find the first memoryDef that
>> dominates all the invariant.group uses in given path, or use the closest
>> non-invariant.group def if it doesn't exist, like here:
>>
>
>
>
>
>>
>> define void @hard(i8* %p) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   store i8 42, i8* %p, !invariant.group !0
>> ; 2 = MemoryDef(1)
>>   call void @clobber8(i8* %p)
>>   br i1 undef, label %b1, label %b2
>>
>> b1:                                               ; preds = %0
>> ; 3 = MemoryDef(2)
>>   store i8 42, i8* %p, !invariant.group !0
>> ; MemoryUse(1)
>>   %1 = load i8, i8* %p, !invariant.group !0
>>   br label %b3
>>
>> b2:                                               ; preds = %0
>> ; MemoryUse(1)
>>   %2 = load i8, i8* %p, !invariant.group !0
>>   br label %b3
>>
>> b3:                                               ; preds = %b2, %b1
>> ; 5 = MemoryPhi({b1,3},{b2,2})
>> ; MemoryUse(5)
>>   %3 = load i8, i8* %p, !invariant.group !0
>> ; 4 = MemoryDef(5)
>>   store i8 42, i8* %p, !invariant.group !0
>> ; MemoryUse(4 )
>>   %4 = load i8, i8* %p, !invariant.group !0
>>   ret void
>> }
>>
>> After removing the first instruction I would expect:
>>
>
> Nope.
> As mentioned, neither removal, nor the updater,  guarantees optimized uses.
> The walker will reoptimize them and reset defining access as necessary.
>
> Instead, all that will happen in your example, if you remove store 2, is
> that we will produce a conservatively correct answer:
> We will repoint all uses that pointed at 2 at 2's defining access.
>
> This is guaranteed to be a thing that dominates 2 (by the SSA property).
> It's also guaranteed not to be wrong.  Whatever uses reached 2 could not
> have aliased with anything in the way.
> So the next thing it could possibly be aliased to is 2's defining access.
>
> Thus, we will produce
>
>>
>> define void @hard(i8* %p) {
>> ; 1 = MemoryDef(liveOnEntry)
>>   call void @clobber8(i8* %p)
>>   br i1 undef, label %b1, label %b2
>>
>> b1:                                               ; preds = %0
>> ; 3 = MemoryDef(1)
>>
>
> Note you had reset the memorydef id of this to 2, which will never happen
> :)
>
>
>>   store i8 42, i8* %p, !invariant.group !0
>> ; MemoryUse(1)
>>   %1 = load i8, i8* %p, !invariant.group !0
>>   br label %b3
>>
>> b2:                                               ; preds = %0
>> ; MemoryUse(1)
>>   %2 = load i8, i8* %p, !invariant.group !0
>>   br label %b3
>>
>> b3:                                               ; preds = %b2, %b1
>> ; 4 = MemoryPhi({b1,3},{b2,1})
>> ; MemoryUse(5)
>>   %3 = load i8, i8* %p, !invariant.group !0
>> ; 3 = MemoryDef(4)
>>   store i8 42, i8* %p, !invariant.group !0
>> ; MemoryUse(4)
>>   %4 = load i8, i8* %p, !invariant.group !0
>>   ret void
>> }
>>
>> Which requires %1, %2, %3 and %4 be updated with different defs.
>>
>
>
> Again, we do not guarantee reoptimized uses at each intermediate point.
> Doing so would be N^3 worst case.
> Instead, we guarantee that when you call getClobberingMemoryAccess on a
> def or use, it gives you the most optimal answer.
> As a side effect, it resets the definingaccess of the use (mainly so it
> doesn't waste time on the same links next time)
>
> We used to guarantee the optimized property all the time in the underlying
> IR itself, but as you've discovered, it is hard invariant to maintain.
>
> This does mean that *uses* of a store may include false-aliases that need
> to be disambiguated if you haven't called getMemoryAccess
> But that's the price of admission ATM.
> We'd need MemorySSI to solve that well.
>
> Updating it fast otherwise basically requires some interesting linking.
> IE what you really want, is, for each memory location or even each store,
> the stores to be linked in a way that looks like a dominator tree
>
> http://link.springer.com/chapter/10.1007/978-3-319-10936-7_13
>
> has a neat way to handle the storage and finding using quadtrees.
>
> But, Store optimizations are much less common than hoist ones for LLVM.
> It's easy enough to teach those things to remove in the right order
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170218/59f21792/attachment.html>