[PATCH] D26127: [MemorySSA] Repair AccessList invariants after insertion of new MemoryUseOrDef.

Wed Nov 2 09:51:39 PDT 2016

>> Here's an example where upwards motion is necessary for the transformation
>> to
>> occur:
>>
>>   %S = type { i8*, i8, i32 }
>>
>>   define void @noaliasaddrproducer(%S* %src, %S* noalias %dst,
>>                                     %S* noalias %dstidptr) {
>>   ; MemoryUse(liveOnEntry)
>>     %1 = load %S, %S* %src              ; combine this into a memcpy...
>>   ; 1 = MemoryDef(liveOnEntry)
>>     store %S zeroinitializer, %S* %src
>>     %y = bitcast %S* %dstidptr to i8*
>>   ; 2 = MemoryDef(1)
>>     call void @llvm.memset.p0i8.i64(i8* %y, i8 0, i64 54, i32 0, i1 false)
>>   ; 3 = MemoryDef(2)
>>     store %S %1, %S* %dstidptr          ; ...with this.
>>     ret void
>>   }
>>
>> The store to %src prevents %1 from being moved down to its corresponding
>> store,
>
> so instead, the corresponding store is moved up to %1.
>
>
> I agree this is how it is currently done.
> I disagree this is the only way to achieve this, and that trying to do it
> this way necessarily makes sense with MemorySSA.
>
> The final result is:
> define void @noaliasaddrproducer(%S* %src, %S* noalias %dst, %S* noalias
> %dstidptr) {
>   %1 = bitcast %S* %src to i8*
>   %y = bitcast %S* %dstidptr to i8*
>   %2 = getelementptr i8, i8* %y, i64 16
>   call void @llvm.memset.p0i8.i64(i8* %2, i8 0, i64 38, i32 8, i1 false)
>   call void @llvm.memcpy.p0i8.p0i8.i64(i8* %y, i8* %1, i64 16, i32 8, i1
> false)
>   call void @llvm.memset.p0i8.i64(i8* %1, i8 0, i64 16, i32 8, i1 false)
>   ret void
> }
>
> In order (and this is not even the only way to do this)
>
> The store of the zero initializer is processed, you walk the use def chain,
> discover it is all zeroes. It is replaced with a memset. Since you are
> replacing a single store with a single memset, it works fine.
>
> You go to the next store.
> You walk the use use-def chain from the memorydef at 3, stopping at each
> piece, and seeing what info it provides, by examining the instruction. For
> each definingAccess, you can examine the uses to see if they provide loads
> that similarly fill in part of the range.
> You discover that 0, 16 has value %1 coming from a load, and 16, 54 is
> coming from zero.
>
> You create the memcpy and memset to make this happen, and replace the load
> and store.
>
> None of this *requires* inserting and then later worrying about things.
>
> What am i missing?

The point of that example was that AccessList splicing is needed. Nothing to do
with insertions. Sorry for the confusion, and it was a bad example anyway
because the intermediate memset above the final store is must-alias with the
final store's pointer. Can we try again?

    %S = type { i8*, i8, i32 }
    declare %S* @compute(%S*)

    define void @f(%S* noalias %a, %S* %b) {
    ; MemoryUse(liveOnEntry)
      %1 = load %S, %S* %a
    ; 1 = MemoryDef(liveOnEntry)
      store %S zeroinitializer, %S* %a  ; prevents %1 from being moved down
    ; 2 = MemoryDef(1)
      %ptr = call %S* @compute(%S* %b)  ; must be moved up with 3
    ; 3 = MemoryDef(2)
      store %S %1, %S* %ptr
      ret void
    }

(For brevity, I'm going to refer to each store instruction by their MemoryDef
ID.)

Similar to my previous crappy example, 1 blocks the downward motion of
%1 = load, so 3 has to move up above 1 before %1 + 3 can be combined into a
memcpy. Also, 2 must move upwards with 3 because 3 depends on it.

So 3 must be spliced above 1, and splicing consists of: RAU 3 W 2; RAU 1 W 3;
setDefiningAccess of 3 to 1's defining access. The last part isn't possible
because sDA isn't public for non-MemoryUses. Also, if for some reason we needed
to splice a MemoryDef ahead of its defining MemoryDef, RAUW wouldn't work.