[llvm-dev] GVN removing loads that are affected by call

Ryan Taylor via llvm-dev llvm-dev at lists.llvm.org
Thu Feb 18 06:14:08 PST 2021


Juneyoung,

  Are you currently working on this? If not, please let me know and I can
try to work something up.

Thanks,

Ryan

On Wed, Feb 10, 2021 at 3:34 AM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr>
wrote:

> Yep, let me work on adding the new attribute.
>
> Juneyoung
>
>
>> On 1/18/21 5:38 PM, Ryan Taylor wrote:
>> > Johannes,
>> >
>> >    Thanks, this makes it a bit more clear, yes. I was a bit confused if
>> it
>> > was copying the pointer or the data being pointed to. It looks like you
>> are
>> > proposing the new attribute to fill this hole that the compiler is not
>> > currently able to tell that this pointer, though "escaped" is not used
>> > again and is therefore functionally the same as not captured?
>>
>> The pointer is not "captured" in the general sense but it is potentially
>> returned. That allows you to keep the "nocapture" property if you follow
>> the return value uses as if they might be the pointer itself.
>>
>>
>> >    Do you have a phab for the recently proposed
>> no-capture-maybe-returned
>> > attribute?
>>
>> No and I don't have code that makes use of it outside the Attributor.
>>
>> @Juneyoung expressed interest in the attribute recently. He, or maybe
>> you, could write the RFC, add the enum attribute pluming, and then uses
>> in transformations. We could use the Attributor to do some preliminary
>> testing but it would be best to play around with the usage of this new
>> idea in other transformations as well.
>>
>> ~ Johannes
>>
>>
>> > Thanks.
>> >
>> >
>> >
>> > On Mon, Jan 18, 2021 at 6:13 PM Johannes Doerfert <
>> > johannesdoerfert at gmail.com> wrote:
>> >
>> >> Your example/trouble starts with an "identified function local" (for
>> >> short, IFL) pointer
>> >> (see llvm::isIdentifiedFunctionLocal), i.a., as it falls  out of
>> >> `malloc`, `alloca`, or
>> >> a `__restrict` qualified argument.
>> >>
>> >> Since no access on a pointer not derived from the IFL one can cause the
>> >> underlying memory to be altered,
>> >> we can ignore other accesses *iff* we can show they don't use a pointer
>> >> that is "based-on" the IFL one.
>> >> The easiest way to prove it is not based on the IFL one is to show the
>> >> IFL pointer does not escape since
>> >> then every based-on use can be identified by def-use chains. A call
>> >> argument that is nocapture guarantees
>> >> no copy of the pointer is made that outlives the call. If you return
>> the
>> >> pointer, a copy is made and you
>> >> broke the contract. We consequently "ignore" accesses through that
>> >> pointer when we argue about the IFL one
>> >> as they cannot alias.
>> >>
>> >> Since we often return a pointer argument, the Attributor uses
>> >> "no-capture-maybe-returned"
>> >> which we recently discussed to propose as a new enum attribute. You
>> >> could then mark the argument as such
>> >> and capture + use traversals could continue with the intrinsic value to
>> >> determine all accesses to the IFL.
>> >>
>> >> I hope this makes it somewhat clearer.
>> >>
>> >> WRT. memcpy, it does not copy the pointers you put it but the memory
>> >> they point to. The address of the memory
>> >> did not escape/was not captured.
>> >>
>> >> ~ Johannes
>> >>
>> >>
>> >> On 1/18/21 4:55 PM, Ryan Taylor wrote:
>> >>> I'm still a bit fuzzy on the definition but that doesn't seem to
>> >> completely
>> >>> agree with pointer capture?
>> >>>
>> >>> " A pointer value *is captured* if the function makes a copy of any
>> part
>> >> of
>> >>> the pointer that outlives the call. "
>> >>>
>> >>> I can replace memcpy in our example and see no issues. memcpy is no
>> >>> capture.
>> >>>
>> >>> #define SIZE 5
>> >>> #include<string>
>> >>> int foo(short a[SIZE], short b[SIZE], short *res) {
>> >>>     short temp[SIZE] = {0, 0, 0, 0, 0};
>> >>>     short *ptr;
>> >>>     ptr = temp;
>> >>>     memcpy(ptr, b, SIZE*2);
>> >>>     short t = temp[0];
>> >>>     for (int i = 1; i < 4; i++) {
>> >>>       if (t > temp[i]) t = temp[i];
>> >>>     }
>> >>>     *res = t;
>> >>>     memcpy(ptr, b, SIZE*2);
>> >>>     t = temp[0];
>> >>>     for (int i = 1; i < 4; i++) {
>> >>>       if (t > temp[i]) t = temp[i];
>> >>>     }
>> >>>     return t;
>> >>> }
>> >>>
>> >>> So you're saying that if the memcpy intrinsic, in this case, returned
>> a
>> >>> pointer (even if not directly used, ie "ptr" in this case) and had
>> >>> NoCapture attribute, the second set of loads would be removed?
>> >>>
>> >>> -Ryan
>> >>>
>> >>> On Mon, Jan 18, 2021 at 4:30 PM Johannes Doerfert <
>> >>> johannesdoerfert at gmail.com> wrote:
>> >>>
>> >>>> A function that returns a pointer argument "captures" it for now.
>> >>>>
>> >>>> The Attributor uses "nocapute_maybe_returned" internally but it is
>> not
>> >>>> an enum attribute (yet).
>> >>>>
>> >>>>
>> >>>> On 1/15/21 7:32 PM, Ryan Taylor wrote:
>> >>>>> So this seems to be an issue with the intrinsic setting NoCapture<0>
>> >> (it
>> >>>>> returns the pointer it takes). We have a similar intrinsic that does
>> >> the
>> >>>>> exact same thing but doesn't return the pointer. It also has
>> >> NoCapture<1>
>> >>>>> and doesn't cause this issue. Also, I managed to substitute memcpy
>> and
>> >>>>> there were no issues, it also has nocapture.
>> >>>>>
>> >>>>> Are there some papers/citations which talk more about Pointer
>> >>>>> Capture/Capture Tracking?
>> >>>>>
>> >>>>> I saw this blog:
>> >>>>>
>> https://jonasdevlieghere.com/escape-analysis-capture-tracking-in-llvm/
>> >>>>> And I read the CaptureTracking.h short description but I'd like a
>> >> clearer
>> >>>>> grasp.
>> >>>>>
>> >>>>> Thanks,
>> >>>>>
>> >>>>> Ryan
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Jan 14, 2021 at 2:03 PM Ryan Taylor <ryta1203 at gmail.com>
>> >> wrote:
>> >>>>>> Ok, thanks. I'll try to come up with a more generic test case.
>> >>>>>>
>> >>>>>> On Thu, Jan 14, 2021 at 2:00 PM Johannes Doerfert <
>> >>>>>> johannesdoerfert at gmail.com> wrote:
>> >>>>>>
>> >>>>>>> Nothing comes to mind given the information.
>> >>>>>>>
>> >>>>>>> ~ Johannes
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On 1/14/21 12:56 PM, Ryan Taylor wrote:
>> >>>>>>>> argmemonly, nounwind, writeonly
>> >>>>>>>>
>> >>>>>>>> -fno-strict-aliasing did not help.
>> >>>>>>>>
>> >>>>>>>> On Thu, Jan 14, 2021 at 1:29 PM Johannes Doerfert <
>> >>>>>>>> johannesdoerfert at gmail.com> wrote:
>> >>>>>>>>
>> >>>>>>>>> Can you share the attributes of the intrinsic declaration,
>> >>>>>>>>> assuming removing `!tbaa` didn't help.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On 1/14/21 11:43 AM, Ryan Taylor wrote:
>> >>>>>>>>>> Yes, this is for downstream/out of tree target so I'm not sure
>> how
>> >>>> you
>> >>>>>>>>>> could reproduce it either but I thought the IR might help a
>> bit.
>> >>>>>>>>>>
>> >>>>>>>>>> My guess is it's not a bug in GVN as much as an issue with the
>> >>>>>>> intrinsic
>> >>>>>>>>>> properties, or lack thereof. I put this in the first post but
>> the
>> >>>>>>> Alias
>> >>>>>>>>>> Sets show:
>> >>>>>>>>>>
>> >>>>>>>>>> AliasSet[0x75fbd0, 9] may alias, Mod/Ref   Pointers: (i8* %1,
>> 8),
>> >>>>>>> (i16*
>> >>>>>>>>>> %arrayidx, 2), (i16* %arrayidx1, 2), (i16* %arrayidx5, 2),
>> (i16*
>> >>>> %19,
>> >>>>>>> 2),
>> >>>>>>>>>> (i16* %arrayidx6, 2), (i16* %arrayidx14, 2), (i16*
>> %arrayidx19, 2)
>> >>>>>>>>>>          4 Unknown instructions: void <badref>, <4 x i16>* %6,
>> <4 x
>> >>>> i16>*
>> >>>>>>>>> %22,
>> >>>>>>>>>> void <badref>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Thu, Jan 14, 2021 at 12:37 PM Johannes Doerfert <
>> >>>>>>>>>> johannesdoerfert at gmail.com> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> There is still not enough information here.
>> >>>>>>>>>>>
>> >>>>>>>>>>> My first guess. The `!tbaa` annotation on the `XXX.intrinsic`
>> and
>> >>>> the
>> >>>>>>>>>>> `load`
>> >>>>>>>>>>> basically encode there is no alias. Easy to verify, remove the
>> >> ones
>> >>>>>>> on
>> >>>>>>>>> the
>> >>>>>>>>>>> intrinsic.
>> >>>>>>>>>>>
>> >>>>>>>>>>> ~ Johannes
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> P.S. If this was a bug in GVN, and I assume it is not, a
>> >> reproducer
>> >>>>>>>>>>> would help
>> >>>>>>>>>>>            a lot. So a small IR sample that shows the problem
>> and
>> >>>> which
>> >>>>>>> we
>> >>>>>>>>>>> can run.
>> >>>>>>>>>>>            This is a "redacted?" IR fragment in which I don't
>> know
>> >>>> what
>> >>>>>>>>>>> transformation
>> >>>>>>>>>>>            is problematic. I also can not run it through GVN,
>> >> which
>> >>>>>>> makes it
>> >>>>>>>>>>> impossible
>> >>>>>>>>>>>            to reproduce.
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> On 1/14/21 11:27 AM, Ryan Taylor via llvm-dev wrote:
>> >>>>>>>>>>>> This is right before GVN:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> define i32 @foo(<4 x i16> %p, <4 x i16> %p1, i16* nocapture
>> >> %res)
>> >>>>>>>>>>>> local_unnamed_addr #0 !dbg !6 {
>> >>>>>>>>>>>> entry:
>> >>>>>>>>>>>>         %temp = alloca i64, align 8
>> >>>>>>>>>>>>         %tmpcast = bitcast i64* %temp to [4 x i16]*
>> >>>>>>>>>>>>         %0 = bitcast i64* %temp to i8*, !dbg !8
>> >>>>>>>>>>>>         call void @llvm.lifetime.start.p0i8(i64 8, i8*
>> nonnull
>> >> %0)
>> >>>> #3,
>> >>>>>>>>> !dbg !8
>> >>>>>>>>>>>>         store i64 0, i64* %temp, align 8, !dbg !9
>> >>>>>>>>>>>>         %1 = bitcast i64* %temp to <4 x i16>*, !dbg !10
>> >>>>>>>>>>>>         %2 = call <4 x i16>* @llvm.XXX.intrinsic(<4 x i16>*
>> >> nonnull
>> >>>>>>> %1, <4
>> >>>>>>>>> x
>> >>>>>>>>>>> i16>
>> >>>>>>>>>>>> %p, i32 0), !dbg !11, !tbaa !12
>> >>>>>>>>>>>>         %arrayidx = bitcast i64* %temp to i16*, !dbg !16
>> >>>>>>>>>>>>         %3 = load i16, i16* %arrayidx, align 8, !dbg !16,
>> !tbaa
>> >> !17
>> >>>>>>>>>>>>         br label %for.body, !dbg !19
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> for.body:                                         ; preds =
>> >> %entry
>> >>>>>>>>>>>>         %arrayidx1 = getelementptr inbounds [4 x i16], [4 x
>> i16]*
>> >>>>>>> %tmpcast,
>> >>>>>>>>>>> i32
>> >>>>>>>>>>>> 0, i32 1, !dbg !20
>> >>>>>>>>>>>>         %4 = load i16, i16* %arrayidx1, align 2, !dbg !20,
>> !tbaa
>> >> !17
>> >>>>>>>>>>>>         %cmp3 = icmp sgt i16 %3, %4, !dbg !21
>> >>>>>>>>>>>>         %spec.select = select i1 %cmp3, i16 %4, i16 %3, !dbg
>> !22
>> >>>>>>>>>>>>         %arrayidx1.1 = getelementptr inbounds [4 x i16], [4 x
>> >> i16]*
>> >>>>>>>>> %tmpcast,
>> >>>>>>>>>>> i32
>> >>>>>>>>>>>> 0, i32 2, !dbg !20
>> >>>>>>>>>>>>         %5 = load i16, i16* %arrayidx1.1, align 2, !dbg !20,
>> >> !tbaa
>> >>>> !17
>> >>>>>>>>>>>>         %cmp3.1 = icmp sgt i16 %spec.select, %5, !dbg !21
>> >>>>>>>>>>>>         %spec.select.1 = select i1 %cmp3.1, i16 %5, i16
>> >>>> %spec.select,
>> >>>>>>> !dbg
>> >>>>>>>>> !22
>> >>>>>>>>>>>>         %arrayidx1.2 = getelementptr inbounds [4 x i16], [4 x
>> >> i16]*
>> >>>>>>>>> %tmpcast,
>> >>>>>>>>>>> i32
>> >>>>>>>>>>>> 0, i32 3, !dbg !20
>> >>>>>>>>>>>>         %6 = load i16, i16* %arrayidx1.2, align 2, !dbg !20,
>> >> !tbaa
>> >>>> !17
>> >>>>>>>>>>>>         %cmp3.2 = icmp sgt i16 %spec.select.1, %6, !dbg !21
>> >>>>>>>>>>>>         %spec.select.2 = select i1 %cmp3.2, i16 %6, i16
>> >>>> %spec.select.1,
>> >>>>>>>>> !dbg
>> >>>>>>>>>>> !22
>> >>>>>>>>>>>>         store i16 %spec.select.2, i16* %res, align 2, !dbg
>> !23,
>> >>>> !tbaa
>> >>>>>>> !17
>> >>>>>>>>>>>>         %7 = tail call <4 x i16>* @llvm.XXX.intrinsic(<4 x
>> i16>*
>> >> %2,
>> >>>>>>> <4 x
>> >>>>>>>>> i16>
>> >>>>>>>>>>>> %p1, i32 0), !dbg !24, !tbaa !12
>> >>>>>>>>>>>>         %8 = load i16, i16* %arrayidx, align 8, !dbg !25,
>> !tbaa
>> >> !17
>> >>>>>>>>>>>>         br label %for.body12, !dbg !26
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> for.body12:                                       ; preds =
>> >>>>>>> %for.body
>> >>>>>>>>>>>>         %arrayidx14 = getelementptr inbounds [4 x i16], [4 x
>> >> i16]*
>> >>>>>>>>> %tmpcast,
>> >>>>>>>>>>> i32
>> >>>>>>>>>>>> 0, i32 1, !dbg !27
>> >>>>>>>>>>>>         %9 = load i16, i16* %arrayidx14, align 2, !dbg !27,
>> !tbaa
>> >>>> !17
>> >>>>>>>>>>>>         %cmp16 = icmp sgt i16 %8, %9, !dbg !28
>> >>>>>>>>>>>>         %spec.select39 = select i1 %cmp16, i16 %9, i16 %8,
>> !dbg
>> >> !29
>> >>>>>>>>>>>>         %arrayidx14.1 = getelementptr inbounds [4 x i16], [4
>> x
>> >> i16]*
>> >>>>>>>>> %tmpcast,
>> >>>>>>>>>>>> i32 0, i32 2, !dbg !27
>> >>>>>>>>>>>>         %10 = load i16, i16* %arrayidx14.1, align 2, !dbg
>> !27,
>> >> !tbaa
>> >>>>>>> !17
>> >>>>>>>>>>>>         %cmp16.1 = icmp sgt i16 %spec.select39, %10, !dbg !28
>> >>>>>>>>>>>>         %spec.select39.1 = select i1 %cmp16.1, i16 %10, i16
>> >>>>>>> %spec.select39,
>> >>>>>>>>>>> !dbg
>> >>>>>>>>>>>> !29
>> >>>>>>>>>>>>         %arrayidx14.2 = getelementptr inbounds [4 x i16], [4
>> x
>> >> i16]*
>> >>>>>>>>> %tmpcast,
>> >>>>>>>>>>>> i32 0, i32 3, !dbg !27
>> >>>>>>>>>>>>         %11 = load i16, i16* %arrayidx14.2, align 2, !dbg
>> !27,
>> >> !tbaa
>> >>>>>>> !17
>> >>>>>>>>>>>>         %cmp16.2 = icmp sgt i16 %spec.select39.1, %11, !dbg
>> !28
>> >>>>>>>>>>>>         %spec.select39.2 = select i1 %cmp16.2, i16 %11, i16
>> >>>>>>>>> %spec.select39.1,
>> >>>>>>>>>>>> !dbg !29
>> >>>>>>>>>>>>         %conv24 = sext i16 %spec.select39.2 to i32, !dbg !30
>> >>>>>>>>>>>>         call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull
>> %0)
>> >> #3,
>> >>>>>>> !dbg
>> >>>>>>>>> !31
>> >>>>>>>>>>>>         ret i32 %conv24, !dbg !32
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Thu, Jan 14, 2021 at 11:54 AM Roman Lebedev <
>> >>>>>>> lebedev.ri at gmail.com>
>> >>>>>>>>>>> wrote:
>> >>>>>>>>>>>>> It would be good to have an actual IR reproducer here.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Thu, Jan 14, 2021 at 7:51 PM Ryan Taylor via llvm-dev
>> >>>>>>>>>>>>> <llvm-dev at lists.llvm.org> wrote:
>> >>>>>>>>>>>>>> So given an intrinsic that has a pointer as in/out and
>> >>>>>>> IntrWriteMem
>> >>>>>>>>>>>>> property.
>> >>>>>>>>>>>>>> call intrinsic(address a, ....);
>> >>>>>>>>>>>>>> loop over address a
>> >>>>>>>>>>>>>>         load from address a + offset
>> >>>>>>>>>>>>>> call intrinsic (address a, ...);
>> >>>>>>>>>>>>>> loop over address a
>> >>>>>>>>>>>>>>         load from address a + offset
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> GVN is removing the second loads, despite the second call
>> >>>>>>> overwriting
>> >>>>>>>>>>>>> the memory starting at address a. AA has the intrinsics
>> marked
>> >> as
>> >>>>>>>>>>> unknown
>> >>>>>>>>>>>>> instructions but has all of these as mayAlias in a set. I'm
>> not
>> >>>>>>> seeing
>> >>>>>>>>>>> this
>> >>>>>>>>>>>>> issue with -fno-unroll-loops.
>> >>>>>>>>>>>>>> Thanks.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> _______________________________________________
>> >>>>>>>>>>>>>> LLVM Developers mailing list
>> >>>>>>>>>>>>>> llvm-dev at lists.llvm.org
>> >>>>>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> >>>>>>>>>>>> _______________________________________________
>> >>>>>>>>>>>> LLVM Developers mailing list
>> >>>>>>>>>>>> llvm-dev at lists.llvm.org
>> >>>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210218/92aa5ab3/attachment.html>


More information about the llvm-dev mailing list