[PATCH] D26127: [MemorySSA] Repair AccessList invariants after insertion of new MemoryUseOrDef.

Bryant Wong via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 1 09:41:43 PDT 2016


For some reason, your most recent replies failed to arrive to my inbox, and none
of them are visible on Phab. The below was what I could dig up from the
llvm-commits website. Please let me know if anything's missing.

>> To give this a bit of context, this patch stems from issues that I've
>> encountered while porting MemCpyOpt to MSSA.
>>
>
> Okay. I'm not sure i would try to port instead of just rewrite.  The whole
> goal of MemorySSA is to enable us to write memory optimizations in non-N^2
> ways.

By "rewrite," are you suggesting that I abandon incrementality altogether and
create a new but analogous pass wholesale? This would surely be liberating, but
it seems that incremental patches are easier to review and likelier to be
accepted.

The reason for asking is that there are areas of MCO that rely on MemDep in ways
that aren't easily translated to MSSA. For instance, processStore requires
moving possibly memory-related instructions upwards before transforming a
load-store pair into a memcpy. This is problematic because MSSA currently
doesn't support staying synchronized to instruction motion (as far as I've
read).

Here's an example where upwards motion is necessary for the transformation to
occur:

  %S = type { i8*, i8, i32 }

  define void @noaliasaddrproducer(%S* %src, %S* noalias %dst,
                                    %S* noalias %dstidptr) {
  ; MemoryUse(liveOnEntry)
    %1 = load %S, %S* %src              ; combine this into a memcpy...
  ; 1 = MemoryDef(liveOnEntry)
    store %S zeroinitializer, %S* %src
    %y = bitcast %S* %dstidptr to i8*
  ; 2 = MemoryDef(1)
    call void @llvm.memset.p0i8.i64(i8* %y, i8 0, i64 54, i32 0, i1 false)
  ; 3 = MemoryDef(2)
    store %S %1, %S* %dstidptr          ; ...with this.
    ret void
  }

The store to %src prevents %1 from being moved down to its corresponding store,
so instead, the corresponding store is moved up to %1. But that would require
moving moving the bitcast and, in particular, the memset.

Currently, I handle this with a `spliceMemoryAccess" entry point that splices
the AccessList around; RAU of 2 W 1; sets 2's def-access to liveOnEntry; and
resetOptimized whichever MemoryUses depended on 2. Would such an API call be
acceptable to add to MSSA?

>
> If you just replace one querying with the other querying, this is not
> likely to give you this result.
> It may be faster (or slower).

My hunch is that even a tiny bit slower unoptimized usage of MSSA would still be
faster than the current situation with MemDep. And a deal more versatile as
well. For instance, nonlocal patterns that aren't touched now would be trivial
to match with MSSA.

>
> MemCpyOpt currently iterates in no particular order.
> It should be possible to do it all in a single iteration, either bottom up
> or top down.

I don't have an example right this moment, but don't Inst- and DAGCombines and a
bunch of other tree reduction-esque passes that operate on post-isel, pre-alloc
MachineInstr iterate until no changes are made?

>
>
>> Many of the transforms in this pass
>> create entirely new memsets, memcpys, and memmoves (among others). After
>> each
>> new insertion, I expect to also have to update MSSA since 1)
>> MemorySSAWrapperPass is delcared preserved to the pass manager, and 2)
>> subsequent MCO iterations which depend on MSSA need to be able to see,
>> analyze,
>> and possibly further transform these newly added meminsts.
>>
>
> Can you give an example of a case where, processing in top-down order,
> going from defs to uses, it's not possible to do this in a single iteration?
>
> I understand why, giving a random ordering it does it in now, and no
> use-def/def-use chains, it cannot do so.
> I cannot, off the top of my head, think of a case it would not be possible
> to do in a single iteration.
>
>
>> Here's an example: MemCpyOptPass::tryMergingIntoMemset will convert the
>> sequences of stores to %P into a memset:
>>
>>   define void @foo(i64* nocapture %P, i64* %Q) {
>>   entry:
>>     %0 = bitcast i64* %P to i16*
>>     %arrayidx = getelementptr inbounds i16, i16* %0, i64 1
>>     %1 = bitcast i16* %arrayidx to i32*
>>     %arrayidx1 = getelementptr inbounds i16, i16* %0, i64 3
>>   ; 1 = MemoryDef(liveOnEntry)
>>     store i16 0, i16* %0, align 2
>>   ; 2 = MemoryDef(1)
>>     store i32 0, i32* %1, align 4
>>   ; 3 = MemoryDef(2)
>>     store i16 0, i16* %arrayidx1, align 2
>>   ; 4 = MemoryDef(3)
>>     store i64 0, i64* %Q
>>     ret void
>>   }
>>
>> Specifically, a memset is first inserted right before the store to %Q:
>>
>>   define void @foo(i64* nocapture %P, i64* %Q) {
>>   entry:
>>     %0 = bitcast i64* %P to i16*
>>     %arrayidx = getelementptr inbounds i16, i16* %0, i64 1
>>     %1 = bitcast i16* %arrayidx to i32*
>>     %arrayidx1 = getelementptr inbounds i16, i16* %0, i64 3
>>   ; 1 = MemoryDef(liveOnEntry)
>>     store i16 0, i16* %0, align 2
>>   ; 2 = MemoryDef(1)
>>     store i32 0, i32* %1, align 4
>>   ; 3 = MemoryDef(2)
>>     store i16 0, i16* %arrayidx1, align 2
>>
>>     call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 8, i32 2, i1 false)
>> ; new
>>
>>   ; 4 = MemoryDef(3)
>>     store i64 0, i64* %Q
>>     ret void
>>   }
>>
>> Next, a new MemoryAccess is created by passing to createMemoryAccessBefore
>> the
>> memset instruction, `3 = MemoryDef(2)` (the new access's defining access),
>>
> and
>> `4 = MemoryDef(3)` (the new access's insertion point).
>>
>
> This won't work, of course.
>
> However, this memset is being created to eliminate stores, no?
> What stores are you eliminating?
>
> If it it's the stores at 1, 2, and 3, then for sure, the correct defining
> access is *not* "3 = MemoryDef".
>
> It's going to be "liveOnEntry" after you kill the stores.
>
> Then, RAUW on each of the MemoryDef's works fine, and you removeAccess them.
>
>
>
>
>>   define void @foo(i64* nocapture %P, i64* %Q) {
>>   entry:
>>     %0 = bitcast i64* %P to i16*
>>     %arrayidx = getelementptr inbounds i16, i16* %0, i64 1
>>     %1 = bitcast i16* %arrayidx to i32*
>>     %arrayidx1 = getelementptr inbounds i16, i16* %0, i64 3
>>   ; 1 = MemoryDef(liveOnEntry)
>>     store i16 0, i16* %0, align 2
>>   ; 2 = MemoryDef(1)
>>     store i32 0, i32* %1, align 4
>>   ; 3 = MemoryDef(2)
>>     store i16 0, i16* %arrayidx1, align 2
>>
>>   ; 5 = MemoryDef(3)
>>     call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 8, i32 2, i1 false)
>> ; new
>>
>>   ; 4 = MemoryDef(3)
>>     store i64 0, i64* %Q
>>     ret void
>>   }
>>
>> This is already problematic because there's no way to selectively update
>> the
>> post-5 MemoryDefs that point to 5 instead of 3. Calling RAUW --
>> specifically,
>> replacing all uses of 3 with 5 -- would result in this:
>>
>
> I'm aware the existing APIs will not do what you want, because, as i said,
> they are specifically and documented to only  be meant for replacement.
>
>
> When i originally had API's to allow for more direct manipulation, folks
> instead said "please build utilities to make common transforms work".
>
> Thus, the existing API's make *replacement* work.

Why can't there be support for both replacement and insertion? How does
supporting replacement preclude supporting insertion? Why can't there be entry
points for both, with the ones for the latter labeled "Danger: Really Expensive.
Will muck up the optimized MemoryUses."?

>
> No amount of futzing is going to make them work well for insertion.
>
> If you want on the fly updating for stores, where you aren't doing
> replacement, that's quite expensive.

What impact would insertions have? For a MemoryUse: Insert and optimize
by searching upwards. For a MemoryDef: Insert, RAUW, and possibly regenerate the
loads that pointed to the MemoryDefs that sit above us. Or leave the last part
out, since they're re-optimized when getClobberingMA is called on them. Am I
missing something crucial here?

On Sun, Oct 30, 2016 at 8:03 PM, Bryant Wong
<3.14472+reviews.llvm.org at gmail.com> wrote:
> To give this a bit of context, this patch stems from issues that I've
> encountered while porting MemCpyOpt to MSSA. Many of the transforms in this
> pass
> create entirely new memsets, memcpys, and memmoves (among others). After
> each
> new insertion, I expect to also have to update MSSA since 1)
> MemorySSAWrapperPass is delcared preserved to the pass manager, and 2)
> subsequent MCO iterations which depend on MSSA need to be able to see,
> analyze,
> and possibly further transform these newly added meminsts.
>
> Here's an example: MemCpyOptPass::tryMergingIntoMemset will convert the
> sequences of stores to %P into a memset:
>
>   define void @foo(i64* nocapture %P, i64* %Q) {
>   entry:
>     %0 = bitcast i64* %P to i16*
>     %arrayidx = getelementptr inbounds i16, i16* %0, i64 1
>     %1 = bitcast i16* %arrayidx to i32*
>     %arrayidx1 = getelementptr inbounds i16, i16* %0, i64 3
>   ; 1 = MemoryDef(liveOnEntry)
>     store i16 0, i16* %0, align 2
>   ; 2 = MemoryDef(1)
>     store i32 0, i32* %1, align 4
>   ; 3 = MemoryDef(2)
>     store i16 0, i16* %arrayidx1, align 2
>   ; 4 = MemoryDef(3)
>     store i64 0, i64* %Q
>     ret void
>   }
>
> Specifically, a memset is first inserted right before the store to %Q:
>
>   define void @foo(i64* nocapture %P, i64* %Q) {
>   entry:
>     %0 = bitcast i64* %P to i16*
>     %arrayidx = getelementptr inbounds i16, i16* %0, i64 1
>     %1 = bitcast i16* %arrayidx to i32*
>     %arrayidx1 = getelementptr inbounds i16, i16* %0, i64 3
>   ; 1 = MemoryDef(liveOnEntry)
>     store i16 0, i16* %0, align 2
>   ; 2 = MemoryDef(1)
>     store i32 0, i32* %1, align 4
>   ; 3 = MemoryDef(2)
>     store i16 0, i16* %arrayidx1, align 2
>
>     call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 8, i32 2, i1 false)  ;
> new
>
>   ; 4 = MemoryDef(3)
>     store i64 0, i64* %Q
>     ret void
>   }
>
> Next, a new MemoryAccess is created by passing to createMemoryAccessBefore
> the
> memset instruction, `3 = MemoryDef(2)` (the new access's defining access),
> and
> `4 = MemoryDef(3)` (the new access's insertion point).
>
>   define void @foo(i64* nocapture %P, i64* %Q) {
>   entry:
>     %0 = bitcast i64* %P to i16*
>     %arrayidx = getelementptr inbounds i16, i16* %0, i64 1
>     %1 = bitcast i16* %arrayidx to i32*
>     %arrayidx1 = getelementptr inbounds i16, i16* %0, i64 3
>   ; 1 = MemoryDef(liveOnEntry)
>     store i16 0, i16* %0, align 2
>   ; 2 = MemoryDef(1)
>     store i32 0, i32* %1, align 4
>   ; 3 = MemoryDef(2)
>     store i16 0, i16* %arrayidx1, align 2
>
>   ; 5 = MemoryDef(3)
>     call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 8, i32 2, i1 false)  ;
> new
>
>   ; 4 = MemoryDef(3)
>     store i64 0, i64* %Q
>     ret void
>   }
>
> This is already problematic because there's no way to selectively update the
> post-5 MemoryDefs that point to 5 instead of 3. Calling RAUW --
> specifically,
> replacing all uses of 3 with 5 -- would result in this:
>
>   define void @foo(i64* nocapture %P, i64* %Q) {
>   entry:
>     %0 = bitcast i64* %P to i16*
>     %arrayidx = getelementptr inbounds i16, i16* %0, i64 1
>     %1 = bitcast i16* %arrayidx to i32*
>     %arrayidx1 = getelementptr inbounds i16, i16* %0, i64 3
>   ; 1 = MemoryDef(liveOnEntry)
>     store i16 0, i16* %0, align 2
>   ; 2 = MemoryDef(1)
>     store i32 0, i32* %1, align 4
>   ; 3 = MemoryDef(2)
>     store i16 0, i16* %arrayidx1, align 2
>
>   ; 5 = MemoryDef(5)        ; <====== bad
>     call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 8, i32 2, i1 false)  ;
> new
>
>   ; 4 = MemoryDef(5)
>     store i64 0, i64* %Q
>     ret void
>   }
>
> and setDefiningAccess is only public for MemoryUses.
>
> The problem is that RAU of 3 W 5 needs to happen before 5 is assigned a
> defining
> access to 3. But createMemoryAccessBefore/After, the only entry points for
> position-specific creation of MemoryAccesses, both call createDefinedAccess
> which
> creates the new access and assigns its definition without ever giving a
> chance
> for RAUW to happen in between.
>
> So that's the motivating basis for this patch.
>
>
>
> On Sun, Oct 30, 2016 at 7:08 PM, Bryant Wong
> <3.14472+reviews.llvm.org at gmail.com> wrote:
>>
>>
>> On Sun, Oct 30, 2016 at 6:48 PM, Daniel Berlin <dberlin at dberlin.org>
>> wrote:
>>>
>>> In particular:
>>> "
>>> I'm not so sure that it's sufficient. Suppose, for instance, that I
>>> wanted to insert a MemoryDef between 1 and 2 in the below AccessList:
>>>
>>>
>>>   0 = MemoryDef(liveOnEntry)
>>>   1 = MemoryDef(0)
>>>   2 = MemoryDef(1)
>>>   3 = MemoryUse(1)
>>>
>>> Invoking createMemoryAccess followed by RAUW of 1's uses with 4 would
>>> result in:
>>> ....
>>> "
>>>
>>> Which is why this is not the way one does this, anywhere.
>>> If you wish to *replace* an existing access, the normal way is:
>>>
>>
>> As my original post indicates, I wish to insert an access, not replace
>> anything. I've inserted a new memory instruction (memmove, memcpy, or
>> memset), into the IR and wish to keep the MSSA tree synchronized.
>>
>>>
>>>            MemoryAccess *Def = OldMemAcc->getDefiningAccess();
>>>            NewMemAcc =
>>>                MSSA->createMemoryAccessAfter(Repl, Def,
>>> OldMemAccess->getMemoryInst());
>>>            OldMemAcc->replaceAllUsesWith(NewMemAcc);
>>>
>>> It doesn't make any sense to create a replacing access with the thing you
>>> are going to replace as the definition.
>>>
>> Really, I don't want to replace anything. I'm interested in insertions. We
>> have removals and replacements. So might insertion also be a valid concept
>> with MSSA?
>>
>>>
>>> The same thing, btw, would happen in normal LLVM SSA IR if you used the
>>> thing you are replacing in the definition of the replacement.
>>>
>>> %c = add %a, %b
>>> %e = add %c, %d
>>>
>>> If you wish to replace the first, you can't do this:
>>>
>>> %c = add %a, %b
>>> %tmp = add %a, %c
>>> %e = add %c, %d
>>>
>>> because calling %c->replaceAllUsesWith %tmp
>>> will result in:
>>>
>>> %tmp = add %a, %tmp
>>> %e = add %c, %d
>>>
>>>
>>> IE the behavior you are citing is completely consistent with existing
>>> LLVM IR. If you define a replacement partially in terms of the thing it is
>>> replacing, it breaks.
>>
>>
>> Not replacing, please believe me.
>>
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Oct 30, 2016 at 1:56 PM, Daniel Berlin <dberlin at dberlin.org>
>>> wrote:
>>>>
>>>> For starters, createDefinedAccess is private, so nobody can use it.
>>>> Second, the API calls that use it explicitly take a defining access to
>>>> avoid the problem you cite.
>>>>
>>>> Can you produce a case where normal usage of the update API actually
>>>> does the wrong thing?
>>>>
>>>> I'm entirely unsure what actual usage problem you are trying to solve,
>>>> so I can't help explain how to do it.
>>>>
>>>>
>>>> On Sun, Oct 30, 2016, 1:17 PM Bryant Wong <314472 at gmail.com> wrote:
>>>>>
>>>>> Thanks for clearing up my misunderstanding regarding MemoryUses.
>>>>>
>>>>> On Sun, Oct 30, 2016 at 2:04 PM, Daniel Berlin <dberlin at dberlin.org>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Oct 30, 2016 at 3:03 AM, bryant
>>>>>> <3.14472+reviews.llvm.org at gmail.com> wrote:
>>>>>>>
>>>>>>> bryant created this revision.
>>>>>>> bryant added reviewers: dberlin, george.burgess.iv.
>>>>>>> bryant added a subscriber: llvm-commits.
>>>>>>> bryant set the repository for this revision to rL LLVM.
>>>>>>>
>>>>>>> An invariant of AccessLists is that the defining access of a Use or
>>>>>>> Def is the nearest preceding Def node.
>>>>>>
>>>>>> False
>>>>>> This is correct for def's but not for uses.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> For instance, within a BasicBlock:
>>>>>>>
>>>>>>>   0 = MemoryDef(liveOnEntry)
>>>>>>>   1 = MemoryDef(0)
>>>>>>>   2 = MemoryUse(n)
>>>>>>>   3 = MemoryDef(m)
>>>>>>>
>>>>>>> 1 is the nearest parent Def of 2 and 3, and m and n must be 1.
>>>>>>
>>>>>>
>>>>>> Given the above, this is incorrect.
>>>>>> n may be 0
>>>>>>
>>>>>> IE the following is perfectly legal
>>>>>>   0 = MemoryDef(liveOnEntry)
>>>>>>   1 = MemoryDef(0)
>>>>>>   2 = MemoryUse(0)
>>>>>>   3 = MemoryDef(1)
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This patch simplifies the interfaces of
>>>>>>> createMemoryAccessBefore/After, and augments them to maintain this
>>>>>>> invariant.
>>>>>>>
>>>>>>> Additionally, when inserting a new Def after an existing Def, there
>>>>>>> is currently no (clean) way to update the users of the old Def to use the
>>>>>>> new Def.
>>>>>>
>>>>>>
>>>>>> RAUW works exactly to do this case.
>>>>>>
>>>>>
>>>>>
>>>>> I'm not so sure that it's sufficient. Suppose, for instance, that I
>>>>> wanted to insert a MemoryDef between 1 and 2 in the below AccessList:
>>>>>
>>>>>
>>>>>   0 = MemoryDef(liveOnEntry)
>>>>>   1 = MemoryDef(0)
>>>>>   2 = MemoryDef(1)
>>>>>   3 = MemoryUse(1)
>>>>>
>>>>> Invoking createMemoryAccess followed by RAUW of 1's uses with 4 would
>>>>> result in:
>>>>>
>>>>>
>>>>> So RAUW needs to happen before setting 4's defining access to 1, but
>>>>> this isn't possible with the current createDefiningAccess. What other
>>>>> options are there? RAUW with a bogus Def, create, then re-RAUW to the newly
>>>>> created Def? I feel like I'm missing something obvious, so clarification
>>>>> would be much appreciated.
>>>>>
>>>>>>>
>>>>>>> So createDefiningAccess now permits the option of updating the users.
>>>>>>
>>>>>>
>>>>>> Please, no.
>>>>>>
>>>
>>
>


More information about the llvm-commits mailing list