[PATCH] D104464: [LoopIdiom] Transform memmove-like loop into memmove (PR46179)

Wed Jun 23 07:59:38 PDT 2021

yurai007 added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp:1287
+    // Ensure that LoadBasePtr is after StoreBasePtr or before StoreBasePtr for
+    // negative stride.
+    int64_t LoadOff = 0, StoreOff = 0;
----------------
yurai007 wrote:
> yurai007 wrote:
> > efriedma wrote:
> > > Finally, someone trying to add this transform figured out this check... I think this is the fourth patch trying to add memmove to loopidiom.
> > > 
> > > Do you need to compare the offset to the size of the load/store operations?  I think you might get some funny behavior if the operations overlap.
> > > Finally, someone trying to add this transform figured out this check... I think this is the fourth patch trying to add memmove to loopidiom.
> > Well, hopefully there won't be fifth one :)
> > 
> > > Do you need to compare the offset to the size of the load/store operations? I think you might get some funny behavior if the operations overlap.
> > Yes, it would be nice to prevent overlapping. I'm going to add protection against it and UT reproducer.
> Now load and store can't overlap each other and their sizes must be equal. However I'm not sure how reproducer should look like. Currently isLegalStore reject everything which has abs(Stride) != StoreSize so my understanding is that only one load and one store in loop are allowed. I ended up with following reproducer:
> 
> 
> ```
> define void @do_not_form_memmove_for_overlapped_access(i32* %s, i64 %size) {
> entry:
>   %end.idx = add i64 %size, -1
>   %end.ptr = getelementptr inbounds i32, i32* %s, i64 %end.idx
>   br label %while.body
> 
> while.body:
>   %phi.ptr = phi i32* [ %s, %entry ], [ %next.ptr, %while.body ]
>   %next = bitcast i32* %phi.ptr to i16*
>   %src.ptr = getelementptr i16, i16* %next, i64 1
>   %src.ptr2 = bitcast i16* %src.ptr to i32*
>   ; below misaligned load is overlapped with store.
>   %val = load i32, i32* %src.ptr2, align 4
>   %dst.ptr = getelementptr i32, i32* %phi.ptr, i64 0
>   store i32 %val, i32* %dst.ptr, align 4
>   %next.ptr = getelementptr i32, i32* %phi.ptr, i64 1
>   %cmp = icmp eq i32* %next.ptr, %end.ptr
>   br i1 %cmp, label %exit, label %while.body
> 
> exit:
>   ret void
> }
> ```
> 
> The thing is that I'm not sure whether or not above snippet is correct IR since we do misaligned load.
Ok, I checked documentation and mailing list for better understanding. Changing load just to "%val = load i32, i32* %src.ptr2, align 2" should be fine and give legal IR with underaligned access. Added improved overlapping access UT reproducer.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104464/new/

https://reviews.llvm.org/D104464