[PATCH] D104464: [LoopIdiom] Transform memmove-like loop into memmove (PR46179)

Wed Jun 23 06:52:51 PDT 2021

yurai007 added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp:1287
+    // Ensure that LoadBasePtr is after StoreBasePtr or before StoreBasePtr for
+    // negative stride.
+    int64_t LoadOff = 0, StoreOff = 0;
----------------
yurai007 wrote:
> efriedma wrote:
> > Finally, someone trying to add this transform figured out this check... I think this is the fourth patch trying to add memmove to loopidiom.
> > 
> > Do you need to compare the offset to the size of the load/store operations?  I think you might get some funny behavior if the operations overlap.
> > Finally, someone trying to add this transform figured out this check... I think this is the fourth patch trying to add memmove to loopidiom.
> Well, hopefully there won't be fifth one :)
> 
> > Do you need to compare the offset to the size of the load/store operations? I think you might get some funny behavior if the operations overlap.
> Yes, it would be nice to prevent overlapping. I'm going to add protection against it and UT reproducer.
Now load and store can't overlap each other and their sizes must be equal. However I'm not sure how reproducer should look like. Currently isLegalStore reject everything which has abs(Stride) != StoreSize so my understanding is that only one load and one store in loop are allowed. I ended up with following reproducer:

```
define void @do_not_form_memmove_for_overlapped_access(i32* %s, i64 %size) {
entry:
  %end.idx = add i64 %size, -1
  %end.ptr = getelementptr inbounds i32, i32* %s, i64 %end.idx
  br label %while.body

while.body:
  %phi.ptr = phi i32* [ %s, %entry ], [ %next.ptr, %while.body ]
  %next = bitcast i32* %phi.ptr to i16*
  %src.ptr = getelementptr i16, i16* %next, i64 1
  %src.ptr2 = bitcast i16* %src.ptr to i32*
  ; below misaligned load is overlapped with store.
  %val = load i32, i32* %src.ptr2, align 4
  %dst.ptr = getelementptr i32, i32* %phi.ptr, i64 0
  store i32 %val, i32* %dst.ptr, align 4
  %next.ptr = getelementptr i32, i32* %phi.ptr, i64 1
  %cmp = icmp eq i32* %next.ptr, %end.ptr
  br i1 %cmp, label %exit, label %while.body

exit:
  ret void
}
```

The thing is that I'm not sure whether or not above snippet is correct IR since we do misaligned load.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104464/new/

https://reviews.llvm.org/D104464