[PATCH] D104464: [LoopIdiom] Transform memmove-like loop into memmove (PR46179)
Dawid Jurczak via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 23 07:59:38 PDT 2021
yurai007 added inline comments.
================
Comment at: llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp:1287
+ // Ensure that LoadBasePtr is after StoreBasePtr or before StoreBasePtr for
+ // negative stride.
+ int64_t LoadOff = 0, StoreOff = 0;
----------------
yurai007 wrote:
> yurai007 wrote:
> > efriedma wrote:
> > > Finally, someone trying to add this transform figured out this check... I think this is the fourth patch trying to add memmove to loopidiom.
> > >
> > > Do you need to compare the offset to the size of the load/store operations? I think you might get some funny behavior if the operations overlap.
> > > Finally, someone trying to add this transform figured out this check... I think this is the fourth patch trying to add memmove to loopidiom.
> > Well, hopefully there won't be fifth one :)
> >
> > > Do you need to compare the offset to the size of the load/store operations? I think you might get some funny behavior if the operations overlap.
> > Yes, it would be nice to prevent overlapping. I'm going to add protection against it and UT reproducer.
> Now load and store can't overlap each other and their sizes must be equal. However I'm not sure how reproducer should look like. Currently isLegalStore reject everything which has abs(Stride) != StoreSize so my understanding is that only one load and one store in loop are allowed. I ended up with following reproducer:
>
>
> ```
> define void @do_not_form_memmove_for_overlapped_access(i32* %s, i64 %size) {
> entry:
> %end.idx = add i64 %size, -1
> %end.ptr = getelementptr inbounds i32, i32* %s, i64 %end.idx
> br label %while.body
>
> while.body:
> %phi.ptr = phi i32* [ %s, %entry ], [ %next.ptr, %while.body ]
> %next = bitcast i32* %phi.ptr to i16*
> %src.ptr = getelementptr i16, i16* %next, i64 1
> %src.ptr2 = bitcast i16* %src.ptr to i32*
> ; below misaligned load is overlapped with store.
> %val = load i32, i32* %src.ptr2, align 4
> %dst.ptr = getelementptr i32, i32* %phi.ptr, i64 0
> store i32 %val, i32* %dst.ptr, align 4
> %next.ptr = getelementptr i32, i32* %phi.ptr, i64 1
> %cmp = icmp eq i32* %next.ptr, %end.ptr
> br i1 %cmp, label %exit, label %while.body
>
> exit:
> ret void
> }
> ```
>
> The thing is that I'm not sure whether or not above snippet is correct IR since we do misaligned load.
Ok, I checked documentation and mailing list for better understanding. Changing load just to "%val = load i32, i32* %src.ptr2, align 2" should be fine and give legal IR with underaligned access. Added improved overlapping access UT reproducer.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D104464/new/
https://reviews.llvm.org/D104464
More information about the llvm-commits
mailing list