[PATCH] D26127: [MemorySSA] Repair AccessList invariants after insertion of new MemoryUseOrDef.

Tue Nov 1 11:40:00 PDT 2016

On Tue, Nov 1, 2016 at 9:41 AM, Bryant Wong <
3.14472+reviews.llvm.org at gmail.com> wrote:

> For some reason, your most recent replies failed to arrive to my inbox,
> and none
> of them are visible on Phab. The below was what I could dig up from the
> llvm-commits website. Please let me know if anything's missing.
>
> >> To give this a bit of context, this patch stems from issues that I've
> >> encountered while porting MemCpyOpt to MSSA.
> >>
> >
> > Okay. I'm not sure i would try to port instead of just rewrite.  The
> whole
> > goal of MemorySSA is to enable us to write memory optimizations in
> non-N^2
> > ways.
>
> By "rewrite," are you suggesting that I abandon incrementality altogether
> and
> create a new but analogous pass wholesale?

I'm actually pretty positive you can reuse most of it, but reorder how it
works quite a bit.
I have the beginnings of one :)

> This would surely be liberating, but
> it seems that incremental patches are easier to review and likelier to be
> accepted.
>

By who? :)
The goal of MemorySSA is in fact, to liberate us to rewrite some of the N^2
passes into O(N) passes.
Some of that may be fairly simple ports, some of it may require seriously
rethinking passes.
If we can do it incrementally, great. If we can't, we shouldn't avoid it.

>
> The reason for asking is that there are areas of MCO that rely on MemDep
> in ways
> that aren't easily translated to MSSA.

> For instance, processStore requires
> moving possibly memory-related instructions upwards before transforming a
> load-store pair into a memcpy. This is problematic because MSSA currently
> doesn't support staying synchronized to instruction motion (as far as I've
> read).

It does, for *moving* instructions, and *replacing them*, but not random
insertion.
What you have just stated is perfectly supported.

You also have to differentiate between loads and stores.

Load movement, anywhere, and load insertion, is 100% trivial.
It's *store* insertion, and store movement, that is harder.
Store movement and replacement is doable (GVNHoist already does it).
But as i said, random store insertion is expensive, and in this case, not
actually necessary.

>
>
Here's an example where upwards motion is necessary for the transformation
> to
> occur:
>
>   %S = type { i8*, i8, i32 }
>
>   define void @noaliasaddrproducer(%S* %src, %S* noalias %dst,
>                                     %S* noalias %dstidptr) {
>   ; MemoryUse(liveOnEntry)
>     %1 = load %S, %S* %src              ; combine this into a memcpy...
>   ; 1 = MemoryDef(liveOnEntry)
>     store %S zeroinitializer, %S* %src
>     %y = bitcast %S* %dstidptr to i8*
>   ; 2 = MemoryDef(1)
>     call void @llvm.memset.p0i8.i64(i8* %y, i8 0, i64 54, i32 0, i1 false)
>   ; 3 = MemoryDef(2)
>     store %S %1, %S* %dstidptr          ; ...with this.
>     ret void
>   }
>
> The store to %src prevents %1 from being moved down to its corresponding
> store, moving moving the bitcast and, in particular, the memset.

>
> Currently, I handle this with a `spliceMemoryAccess" entry point that
> splices
> the AccessList around; RAU of 2 W 1; sets 2's def-access to liveOnEntry;
> and
> resetOptimized whichever MemoryUses depended on 2. Would such an API call
> be
> acceptable to add to MSSA?
>

>
> >
> > If you just replace one querying with the other querying, this is not
> > likely to give you this result.
> > It may be faster (or slower).
>
> My hunch is that even a tiny bit slower unoptimized usage of MSSA would
> still be
> faster than the current situation with MemDep. And a deal more versatile as
> well. For instance, nonlocal patterns that aren't touched now would be
> trivial
> to match with MSSA.
>
>

> >
> > MemCpyOpt currently iterates in no particular order.
> > It should be possible to do it all in a single iteration, either bottom
> up
> > or top down.
>
> I don't have an example right this moment, but don't Inst- and DAGCombines
> and a
> bunch of other tree reduction-esque passes that operate on post-isel,
> pre-alloc
> MachineInstr iterate until no changes are made?
>

Yes, but i don't think anyone would claim this is great.

>
> >
> >
> >> Many of the transforms in this pass
> >> create entirely new memsets, memcpys, and memmoves (among others). After
> >> each
> >> new insertion, I expect to also have to update MSSA since 1)
> >> MemorySSAWrapperPass is delcared preserved to the pass manager, and 2)
> >> subsequent MCO iterations which depend on MSSA need to be able to see,
> >> analyze,
> >> and possibly further transform these newly added meminsts.
> >>
> >
> > Can you give an example of a case where, processing in top-down order,
> > going from defs to uses, it's not possible to do this in a single
> iteration?
> >
> > I understand why, giving a random ordering it does it in now, and no
> > use-def/def-use chains, it cannot do so.
> > I cannot, off the top of my head, think of a case it would not be
> possible
> > to do in a single iteration.
> >
> >
> >> Here's an example: MemCpyOptPass::tryMergingIntoMemset will convert the
> >> sequences of stores to %P into a memset:
> >>
> >>   define void @foo(i64* nocapture %P, i64* %Q) {
> >>   entry:
> >>     %0 = bitcast i64* %P to i16*
> >>     %arrayidx = getelementptr inbounds i16, i16* %0, i64 1
> >>     %1 = bitcast i16* %arrayidx to i32*
> >>     %arrayidx1 = getelementptr inbounds i16, i16* %0, i64 3
> >>   ; 1 = MemoryDef(liveOnEntry)
> >>     store i16 0, i16* %0, align 2
> >>   ; 2 = MemoryDef(1)
> >>     store i32 0, i32* %1, align 4
> >>   ; 3 = MemoryDef(2)
> >>     store i16 0, i16* %arrayidx1, align 2
> >>   ; 4 = MemoryDef(3)
> >>     store i64 0, i64* %Q
> >>     ret void
> >>   }
> >>
> >> Specifically, a memset is first inserted right before the store to %Q:
> >>
> >>   define void @foo(i64* nocapture %P, i64* %Q) {
> >>   entry:
> >>     %0 = bitcast i64* %P to i16*
> >>     %arrayidx = getelementptr inbounds i16, i16* %0, i64 1
> >>     %1 = bitcast i16* %arrayidx to i32*
> >>     %arrayidx1 = getelementptr inbounds i16, i16* %0, i64 3
> >>   ; 1 = MemoryDef(liveOnEntry)
> >>     store i16 0, i16* %0, align 2
> >>   ; 2 = MemoryDef(1)
> >>     store i32 0, i32* %1, align 4
> >>   ; 3 = MemoryDef(2)
> >>     store i16 0, i16* %arrayidx1, align 2
> >>
> >>     call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 8, i32 2, i1
> false)
> >> ; new
> >>
> >>   ; 4 = MemoryDef(3)
> >>     store i64 0, i64* %Q
> >>     ret void
> >>   }
> >>
> >> Next, a new MemoryAccess is created by passing to
> createMemoryAccessBefore
> >> the
> >> memset instruction, `3 = MemoryDef(2)` (the new access's defining
> access),
> >>
> > and
> >> `4 = MemoryDef(3)` (the new access's insertion point).
> >>
> >
> > This won't work, of course.
> >
> > However, this memset is being created to eliminate stores, no?
> > What stores are you eliminating?
> >
> > If it it's the stores at 1, 2, and 3, then for sure, the correct defining
> > access is *not* "3 = MemoryDef".
> >
> > It's going to be "liveOnEntry" after you kill the stores.
> >
> > Then, RAUW on each of the MemoryDef's works fine, and you removeAccess
> them.
> >
> >
> >
> >
> >>   define void @foo(i64* nocapture %P, i64* %Q) {
> >>   entry:
> >>     %0 = bitcast i64* %P to i16*
> >>     %arrayidx = getelementptr inbounds i16, i16* %0, i64 1
> >>     %1 = bitcast i16* %arrayidx to i32*
> >>     %arrayidx1 = getelementptr inbounds i16, i16* %0, i64 3
> >>   ; 1 = MemoryDef(liveOnEntry)
> >>     store i16 0, i16* %0, align 2
> >>   ; 2 = MemoryDef(1)
> >>     store i32 0, i32* %1, align 4
> >>   ; 3 = MemoryDef(2)
> >>     store i16 0, i16* %arrayidx1, align 2
> >>
> >>   ; 5 = MemoryDef(3)
> >>     call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 8, i32 2, i1
> false)
> >> ; new
> >>
> >>   ; 4 = MemoryDef(3)
> >>     store i64 0, i64* %Q
> >>     ret void
> >>   }
> >>
> >> This is already problematic because there's no way to selectively update
> >> the
> >> post-5 MemoryDefs that point to 5 instead of 3. Calling RAUW --
> >> specifically,
> >> replacing all uses of 3 with 5 -- would result in this:
> >>
> >
> > I'm aware the existing APIs will not do what you want, because, as i
> said,
> > they are specifically and documented to only  be meant for replacement.
> >
> >
> > When i originally had API's to allow for more direct manipulation, folks
> > instead said "please build utilities to make common transforms work".
> >
> > Thus, the existing API's make *replacement* work.
>
> Why can't there be support for both replacement and insertion?

Because random insertion can cause completely far flung effects that need
updating arbitrarily far downstream.
But most legal movement and replacement cannot.

How does
> supporting replacement preclude supporting insertion? Why can't there be
> entry
> points for both, with the ones for the latter labeled "Danger: Really
> Expensive.
> Will muck up the optimized MemoryUses."?
>

I am, of course, willing to do that, i'm just trying to understand *is it
actually necessary*.
So far, even for memcpyopt, i haven't found it to be necessary :)

>
> >
> > No amount of futzing is going to make them work well for insertion.
> >
> > If you want on the fly updating for stores, where you aren't doing
> > replacement, that's quite expensive.
>
> What impact would insertions have? For a MemoryUse: Insert and optimize
> by searching upwards.

Random MemoryUse insertion is now always easy, of course, because they do
not create defs.

> For a MemoryDef: Insert, RAUW, and possibly regenerate the
> loads that pointed to the MemoryDefs that sit above us.

This will not work
store bar
if (foo)
{
}
else
{
}

use bar.
If you insert a new store in either part of this branch, you need to create
a phi node, and merge it, etc.

This is simple example, it gets arbitrarily hard.

Now, if you *replace* the store to bar with, say, a larger store to bar,
that's easy.

But if you insert a new store, and then 50 years later, replace something
else, that's going to be expensive to keep up to date.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161101/dd2223b9/attachment.html>