[PATCH] D26127: [MemorySSA] Repair AccessList invariants after insertion of new MemoryUseOrDef.

Wed Nov 2 10:16:13 PDT 2016

On Wed, Nov 2, 2016 at 9:51 AM, Bryant Wong <
3.14472+reviews.llvm.org at gmail.com> wrote:

> >> Here's an example where upwards motion is necessary for the
> transformation
> >> to
> >> occur:
> >>
> >>   %S = type { i8*, i8, i32 }
> >>
> >>   define void @noaliasaddrproducer(%S* %src, %S* noalias %dst,
> >>                                     %S* noalias %dstidptr) {
> >>   ; MemoryUse(liveOnEntry)
> >>     %1 = load %S, %S* %src              ; combine this into a memcpy...
> >>   ; 1 = MemoryDef(liveOnEntry)
> >>     store %S zeroinitializer, %S* %src
> >>     %y = bitcast %S* %dstidptr to i8*
> >>   ; 2 = MemoryDef(1)
> >>     call void @llvm.memset.p0i8.i64(i8* %y, i8 0, i64 54, i32 0, i1
> false)
> >>   ; 3 = MemoryDef(2)
> >>     store %S %1, %S* %dstidptr          ; ...with this.
> >>     ret void
> >>   }
> >>
> >> The store to %src prevents %1 from being moved down to its corresponding
> >> store,
> >
> > so instead, the corresponding store is moved up to %1.
> >
> >
> > I agree this is how it is currently done.
> > I disagree this is the only way to achieve this, and that trying to do it
> > this way necessarily makes sense with MemorySSA.
> >
> > The final result is:
> > define void @noaliasaddrproducer(%S* %src, %S* noalias %dst, %S* noalias
> > %dstidptr) {
> >   %1 = bitcast %S* %src to i8*
> >   %y = bitcast %S* %dstidptr to i8*
> >   %2 = getelementptr i8, i8* %y, i64 16
> >   call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 38, i32 8, i1 false)
> >   call void @llvm.memcpy.p0i8.p0i8.i64(i8* %y, i8* %1, i64 16, i32 8, i1
> > false)
> >   call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 16, i32 8, i1 false)
> >   ret void
> > }
> >
> > In order (and this is not even the only way to do this)
> >
> > The store of the zero initializer is processed, you walk the use def
> chain,
> > discover it is all zeroes. It is replaced with a memset. Since you are
> > replacing a single store with a single memset, it works fine.
> >
> > You go to the next store.
> > You walk the use use-def chain from the memorydef at 3, stopping at each
> > piece, and seeing what info it provides, by examining the instruction.
> For
> > each definingAccess, you can examine the uses to see if they provide
> loads
> > that similarly fill in part of the range.
> > You discover that 0, 16 has value %1 coming from a load, and 16, 54 is
> > coming from zero.
> >
> > You create the memcpy and memset to make this happen, and replace the
> load
> > and store.
> >
> > None of this *requires* inserting and then later worrying about things.
> >
> > What am i missing?
>
> The point of that example was that AccessList splicing is needed.

There are certainly times AccessList splicing would be needed to perform an
operation, i'd agree with this :)

> Nothing to do
> with insertions. Sorry for the confusion, and it was a bad example anyway
> because the intermediate memset above the final store is must-alias with
> the
> final store's pointer. Can we try again?
>
>     %S = type { i8*, i8, i32 }
>     declare %S* @compute(%S*)
>
>     define void @f(%S* noalias %a, %S* %b) {
>     ; MemoryUse(liveOnEntry)
>       %1 = load %S, %S* %a
>     ; 1 = MemoryDef(liveOnEntry)
>       store %S zeroinitializer, %S* %a  ; prevents %1 from being moved down
>     ; 2 = MemoryDef(1)
>       %ptr = call %S* @compute(%S* %b)  ; must be moved up with 3
>     ; 3 = MemoryDef(2)
>       store %S %1, %S* %ptr
>       ret void
>     }
>
> (For brevity, I'm going to refer to each store instruction by their
> MemoryDef
> ID.)
>
> Similar to my previous crappy example, 1 blocks the downward motion of
> %1 = load, so 3 has to move up above 1 before %1 + 3 can be combined into a
> memcpy. Also, 2 must move upwards with 3 because 3 depends on it.
>

See, this is again the part i don't get.
You are doing all this to remove these instructions. Why do you need to do
it, update memory ssa, do some more work, and then remove the instructions.
Why can't you simply analyze, determine what you need to do, and produce
the end result all at once?
This requires infinitely less updating work.
Let's assume you don't want to go that route (i will eventually go that
route, FWIW, and replace most of this logic :P)

>
> So 3 must be spliced above 1, and splicing consists of: RAU 3 W 2; RAU 1 W
> 3;
> setDefiningAccess of 3 to 1's defining access. The last part isn't possible
> because sDA isn't public for non-MemoryUses.

Yes.

> Also, if for some reason we needed
> to splice a MemoryDef ahead of its defining MemoryDef, RAUW wouldn't work.
>

Also correct, but that would generate invalid code, of course, as defs
would not dominate uses :)

The next question i have is: "Do you really believe you can perform all
these updates yourself, correctly, no matter what the CFG looks like, etc".
My guess, honestly, is "no"

I can trivially change the CFG above such that you would have to use
IDFCalculator to place memoryphi nodes yourself.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161102/df740b3d/attachment.html>