[llvm-dev] GVN removing loads that are affected by call
Juneyoung Lee via llvm-dev
llvm-dev at lists.llvm.org
Thu Feb 18 06:18:54 PST 2021
Nope, I was occupied by something else recently.
Sure, feel free to work on it. :)
Juneyoung
On Thu, Feb 18, 2021 at 11:14 PM Ryan Taylor <ryta1203 at gmail.com> wrote:
> Juneyoung,
>
> Are you currently working on this? If not, please let me know and I can
> try to work something up.
>
> Thanks,
>
> Ryan
>
> On Wed, Feb 10, 2021 at 3:34 AM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr>
> wrote:
>
>> Yep, let me work on adding the new attribute.
>>
>> Juneyoung
>>
>>
>>> On 1/18/21 5:38 PM, Ryan Taylor wrote:
>>> > Johannes,
>>> >
>>> > Thanks, this makes it a bit more clear, yes. I was a bit confused
>>> if it
>>> > was copying the pointer or the data being pointed to. It looks like
>>> you are
>>> > proposing the new attribute to fill this hole that the compiler is not
>>> > currently able to tell that this pointer, though "escaped" is not used
>>> > again and is therefore functionally the same as not captured?
>>>
>>> The pointer is not "captured" in the general sense but it is potentially
>>> returned. That allows you to keep the "nocapture" property if you follow
>>> the return value uses as if they might be the pointer itself.
>>>
>>>
>>> > Do you have a phab for the recently proposed
>>> no-capture-maybe-returned
>>> > attribute?
>>>
>>> No and I don't have code that makes use of it outside the Attributor.
>>>
>>> @Juneyoung expressed interest in the attribute recently. He, or maybe
>>> you, could write the RFC, add the enum attribute pluming, and then uses
>>> in transformations. We could use the Attributor to do some preliminary
>>> testing but it would be best to play around with the usage of this new
>>> idea in other transformations as well.
>>>
>>> ~ Johannes
>>>
>>>
>>> > Thanks.
>>> >
>>> >
>>> >
>>> > On Mon, Jan 18, 2021 at 6:13 PM Johannes Doerfert <
>>> > johannesdoerfert at gmail.com> wrote:
>>> >
>>> >> Your example/trouble starts with an "identified function local" (for
>>> >> short, IFL) pointer
>>> >> (see llvm::isIdentifiedFunctionLocal), i.a., as it falls out of
>>> >> `malloc`, `alloca`, or
>>> >> a `__restrict` qualified argument.
>>> >>
>>> >> Since no access on a pointer not derived from the IFL one can cause
>>> the
>>> >> underlying memory to be altered,
>>> >> we can ignore other accesses *iff* we can show they don't use a
>>> pointer
>>> >> that is "based-on" the IFL one.
>>> >> The easiest way to prove it is not based on the IFL one is to show the
>>> >> IFL pointer does not escape since
>>> >> then every based-on use can be identified by def-use chains. A call
>>> >> argument that is nocapture guarantees
>>> >> no copy of the pointer is made that outlives the call. If you return
>>> the
>>> >> pointer, a copy is made and you
>>> >> broke the contract. We consequently "ignore" accesses through that
>>> >> pointer when we argue about the IFL one
>>> >> as they cannot alias.
>>> >>
>>> >> Since we often return a pointer argument, the Attributor uses
>>> >> "no-capture-maybe-returned"
>>> >> which we recently discussed to propose as a new enum attribute. You
>>> >> could then mark the argument as such
>>> >> and capture + use traversals could continue with the intrinsic value
>>> to
>>> >> determine all accesses to the IFL.
>>> >>
>>> >> I hope this makes it somewhat clearer.
>>> >>
>>> >> WRT. memcpy, it does not copy the pointers you put it but the memory
>>> >> they point to. The address of the memory
>>> >> did not escape/was not captured.
>>> >>
>>> >> ~ Johannes
>>> >>
>>> >>
>>> >> On 1/18/21 4:55 PM, Ryan Taylor wrote:
>>> >>> I'm still a bit fuzzy on the definition but that doesn't seem to
>>> >> completely
>>> >>> agree with pointer capture?
>>> >>>
>>> >>> " A pointer value *is captured* if the function makes a copy of any
>>> part
>>> >> of
>>> >>> the pointer that outlives the call. "
>>> >>>
>>> >>> I can replace memcpy in our example and see no issues. memcpy is no
>>> >>> capture.
>>> >>>
>>> >>> #define SIZE 5
>>> >>> #include<string>
>>> >>> int foo(short a[SIZE], short b[SIZE], short *res) {
>>> >>> short temp[SIZE] = {0, 0, 0, 0, 0};
>>> >>> short *ptr;
>>> >>> ptr = temp;
>>> >>> memcpy(ptr, b, SIZE*2);
>>> >>> short t = temp[0];
>>> >>> for (int i = 1; i < 4; i++) {
>>> >>> if (t > temp[i]) t = temp[i];
>>> >>> }
>>> >>> *res = t;
>>> >>> memcpy(ptr, b, SIZE*2);
>>> >>> t = temp[0];
>>> >>> for (int i = 1; i < 4; i++) {
>>> >>> if (t > temp[i]) t = temp[i];
>>> >>> }
>>> >>> return t;
>>> >>> }
>>> >>>
>>> >>> So you're saying that if the memcpy intrinsic, in this case,
>>> returned a
>>> >>> pointer (even if not directly used, ie "ptr" in this case) and had
>>> >>> NoCapture attribute, the second set of loads would be removed?
>>> >>>
>>> >>> -Ryan
>>> >>>
>>> >>> On Mon, Jan 18, 2021 at 4:30 PM Johannes Doerfert <
>>> >>> johannesdoerfert at gmail.com> wrote:
>>> >>>
>>> >>>> A function that returns a pointer argument "captures" it for now.
>>> >>>>
>>> >>>> The Attributor uses "nocapute_maybe_returned" internally but it is
>>> not
>>> >>>> an enum attribute (yet).
>>> >>>>
>>> >>>>
>>> >>>> On 1/15/21 7:32 PM, Ryan Taylor wrote:
>>> >>>>> So this seems to be an issue with the intrinsic setting
>>> NoCapture<0>
>>> >> (it
>>> >>>>> returns the pointer it takes). We have a similar intrinsic that
>>> does
>>> >> the
>>> >>>>> exact same thing but doesn't return the pointer. It also has
>>> >> NoCapture<1>
>>> >>>>> and doesn't cause this issue. Also, I managed to substitute memcpy
>>> and
>>> >>>>> there were no issues, it also has nocapture.
>>> >>>>>
>>> >>>>> Are there some papers/citations which talk more about Pointer
>>> >>>>> Capture/Capture Tracking?
>>> >>>>>
>>> >>>>> I saw this blog:
>>> >>>>>
>>> https://jonasdevlieghere.com/escape-analysis-capture-tracking-in-llvm/
>>> >>>>> And I read the CaptureTracking.h short description but I'd like a
>>> >> clearer
>>> >>>>> grasp.
>>> >>>>>
>>> >>>>> Thanks,
>>> >>>>>
>>> >>>>> Ryan
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Thu, Jan 14, 2021 at 2:03 PM Ryan Taylor <ryta1203 at gmail.com>
>>> >> wrote:
>>> >>>>>> Ok, thanks. I'll try to come up with a more generic test case.
>>> >>>>>>
>>> >>>>>> On Thu, Jan 14, 2021 at 2:00 PM Johannes Doerfert <
>>> >>>>>> johannesdoerfert at gmail.com> wrote:
>>> >>>>>>
>>> >>>>>>> Nothing comes to mind given the information.
>>> >>>>>>>
>>> >>>>>>> ~ Johannes
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On 1/14/21 12:56 PM, Ryan Taylor wrote:
>>> >>>>>>>> argmemonly, nounwind, writeonly
>>> >>>>>>>>
>>> >>>>>>>> -fno-strict-aliasing did not help.
>>> >>>>>>>>
>>> >>>>>>>> On Thu, Jan 14, 2021 at 1:29 PM Johannes Doerfert <
>>> >>>>>>>> johannesdoerfert at gmail.com> wrote:
>>> >>>>>>>>
>>> >>>>>>>>> Can you share the attributes of the intrinsic declaration,
>>> >>>>>>>>> assuming removing `!tbaa` didn't help.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> On 1/14/21 11:43 AM, Ryan Taylor wrote:
>>> >>>>>>>>>> Yes, this is for downstream/out of tree target so I'm not
>>> sure how
>>> >>>> you
>>> >>>>>>>>>> could reproduce it either but I thought the IR might help a
>>> bit.
>>> >>>>>>>>>>
>>> >>>>>>>>>> My guess is it's not a bug in GVN as much as an issue with the
>>> >>>>>>> intrinsic
>>> >>>>>>>>>> properties, or lack thereof. I put this in the first post but
>>> the
>>> >>>>>>> Alias
>>> >>>>>>>>>> Sets show:
>>> >>>>>>>>>>
>>> >>>>>>>>>> AliasSet[0x75fbd0, 9] may alias, Mod/Ref Pointers: (i8* %1,
>>> 8),
>>> >>>>>>> (i16*
>>> >>>>>>>>>> %arrayidx, 2), (i16* %arrayidx1, 2), (i16* %arrayidx5, 2),
>>> (i16*
>>> >>>> %19,
>>> >>>>>>> 2),
>>> >>>>>>>>>> (i16* %arrayidx6, 2), (i16* %arrayidx14, 2), (i16*
>>> %arrayidx19, 2)
>>> >>>>>>>>>> 4 Unknown instructions: void <badref>, <4 x i16>*
>>> %6, <4 x
>>> >>>> i16>*
>>> >>>>>>>>> %22,
>>> >>>>>>>>>> void <badref>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> On Thu, Jan 14, 2021 at 12:37 PM Johannes Doerfert <
>>> >>>>>>>>>> johannesdoerfert at gmail.com> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>>> There is still not enough information here.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> My first guess. The `!tbaa` annotation on the
>>> `XXX.intrinsic` and
>>> >>>> the
>>> >>>>>>>>>>> `load`
>>> >>>>>>>>>>> basically encode there is no alias. Easy to verify, remove
>>> the
>>> >> ones
>>> >>>>>>> on
>>> >>>>>>>>> the
>>> >>>>>>>>>>> intrinsic.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> ~ Johannes
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> P.S. If this was a bug in GVN, and I assume it is not, a
>>> >> reproducer
>>> >>>>>>>>>>> would help
>>> >>>>>>>>>>> a lot. So a small IR sample that shows the
>>> problem and
>>> >>>> which
>>> >>>>>>> we
>>> >>>>>>>>>>> can run.
>>> >>>>>>>>>>> This is a "redacted?" IR fragment in which I
>>> don't know
>>> >>>> what
>>> >>>>>>>>>>> transformation
>>> >>>>>>>>>>> is problematic. I also can not run it through GVN,
>>> >> which
>>> >>>>>>> makes it
>>> >>>>>>>>>>> impossible
>>> >>>>>>>>>>> to reproduce.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> On 1/14/21 11:27 AM, Ryan Taylor via llvm-dev wrote:
>>> >>>>>>>>>>>> This is right before GVN:
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> define i32 @foo(<4 x i16> %p, <4 x i16> %p1, i16* nocapture
>>> >> %res)
>>> >>>>>>>>>>>> local_unnamed_addr #0 !dbg !6 {
>>> >>>>>>>>>>>> entry:
>>> >>>>>>>>>>>> %temp = alloca i64, align 8
>>> >>>>>>>>>>>> %tmpcast = bitcast i64* %temp to [4 x i16]*
>>> >>>>>>>>>>>> %0 = bitcast i64* %temp to i8*, !dbg !8
>>> >>>>>>>>>>>> call void @llvm.lifetime.start.p0i8(i64 8, i8*
>>> nonnull
>>> >> %0)
>>> >>>> #3,
>>> >>>>>>>>> !dbg !8
>>> >>>>>>>>>>>> store i64 0, i64* %temp, align 8, !dbg !9
>>> >>>>>>>>>>>> %1 = bitcast i64* %temp to <4 x i16>*, !dbg !10
>>> >>>>>>>>>>>> %2 = call <4 x i16>* @llvm.XXX.intrinsic(<4 x i16>*
>>> >> nonnull
>>> >>>>>>> %1, <4
>>> >>>>>>>>> x
>>> >>>>>>>>>>> i16>
>>> >>>>>>>>>>>> %p, i32 0), !dbg !11, !tbaa !12
>>> >>>>>>>>>>>> %arrayidx = bitcast i64* %temp to i16*, !dbg !16
>>> >>>>>>>>>>>> %3 = load i16, i16* %arrayidx, align 8, !dbg !16,
>>> !tbaa
>>> >> !17
>>> >>>>>>>>>>>> br label %for.body, !dbg !19
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> for.body: ; preds =
>>> >> %entry
>>> >>>>>>>>>>>> %arrayidx1 = getelementptr inbounds [4 x i16], [4 x
>>> i16]*
>>> >>>>>>> %tmpcast,
>>> >>>>>>>>>>> i32
>>> >>>>>>>>>>>> 0, i32 1, !dbg !20
>>> >>>>>>>>>>>> %4 = load i16, i16* %arrayidx1, align 2, !dbg !20,
>>> !tbaa
>>> >> !17
>>> >>>>>>>>>>>> %cmp3 = icmp sgt i16 %3, %4, !dbg !21
>>> >>>>>>>>>>>> %spec.select = select i1 %cmp3, i16 %4, i16 %3,
>>> !dbg !22
>>> >>>>>>>>>>>> %arrayidx1.1 = getelementptr inbounds [4 x i16], [4
>>> x
>>> >> i16]*
>>> >>>>>>>>> %tmpcast,
>>> >>>>>>>>>>> i32
>>> >>>>>>>>>>>> 0, i32 2, !dbg !20
>>> >>>>>>>>>>>> %5 = load i16, i16* %arrayidx1.1, align 2, !dbg !20,
>>> >> !tbaa
>>> >>>> !17
>>> >>>>>>>>>>>> %cmp3.1 = icmp sgt i16 %spec.select, %5, !dbg !21
>>> >>>>>>>>>>>> %spec.select.1 = select i1 %cmp3.1, i16 %5, i16
>>> >>>> %spec.select,
>>> >>>>>>> !dbg
>>> >>>>>>>>> !22
>>> >>>>>>>>>>>> %arrayidx1.2 = getelementptr inbounds [4 x i16], [4
>>> x
>>> >> i16]*
>>> >>>>>>>>> %tmpcast,
>>> >>>>>>>>>>> i32
>>> >>>>>>>>>>>> 0, i32 3, !dbg !20
>>> >>>>>>>>>>>> %6 = load i16, i16* %arrayidx1.2, align 2, !dbg !20,
>>> >> !tbaa
>>> >>>> !17
>>> >>>>>>>>>>>> %cmp3.2 = icmp sgt i16 %spec.select.1, %6, !dbg !21
>>> >>>>>>>>>>>> %spec.select.2 = select i1 %cmp3.2, i16 %6, i16
>>> >>>> %spec.select.1,
>>> >>>>>>>>> !dbg
>>> >>>>>>>>>>> !22
>>> >>>>>>>>>>>> store i16 %spec.select.2, i16* %res, align 2, !dbg
>>> !23,
>>> >>>> !tbaa
>>> >>>>>>> !17
>>> >>>>>>>>>>>> %7 = tail call <4 x i16>* @llvm.XXX.intrinsic(<4 x
>>> i16>*
>>> >> %2,
>>> >>>>>>> <4 x
>>> >>>>>>>>> i16>
>>> >>>>>>>>>>>> %p1, i32 0), !dbg !24, !tbaa !12
>>> >>>>>>>>>>>> %8 = load i16, i16* %arrayidx, align 8, !dbg !25,
>>> !tbaa
>>> >> !17
>>> >>>>>>>>>>>> br label %for.body12, !dbg !26
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> for.body12: ; preds =
>>> >>>>>>> %for.body
>>> >>>>>>>>>>>> %arrayidx14 = getelementptr inbounds [4 x i16], [4 x
>>> >> i16]*
>>> >>>>>>>>> %tmpcast,
>>> >>>>>>>>>>> i32
>>> >>>>>>>>>>>> 0, i32 1, !dbg !27
>>> >>>>>>>>>>>> %9 = load i16, i16* %arrayidx14, align 2, !dbg !27,
>>> !tbaa
>>> >>>> !17
>>> >>>>>>>>>>>> %cmp16 = icmp sgt i16 %8, %9, !dbg !28
>>> >>>>>>>>>>>> %spec.select39 = select i1 %cmp16, i16 %9, i16 %8,
>>> !dbg
>>> >> !29
>>> >>>>>>>>>>>> %arrayidx14.1 = getelementptr inbounds [4 x i16],
>>> [4 x
>>> >> i16]*
>>> >>>>>>>>> %tmpcast,
>>> >>>>>>>>>>>> i32 0, i32 2, !dbg !27
>>> >>>>>>>>>>>> %10 = load i16, i16* %arrayidx14.1, align 2, !dbg
>>> !27,
>>> >> !tbaa
>>> >>>>>>> !17
>>> >>>>>>>>>>>> %cmp16.1 = icmp sgt i16 %spec.select39, %10, !dbg
>>> !28
>>> >>>>>>>>>>>> %spec.select39.1 = select i1 %cmp16.1, i16 %10, i16
>>> >>>>>>> %spec.select39,
>>> >>>>>>>>>>> !dbg
>>> >>>>>>>>>>>> !29
>>> >>>>>>>>>>>> %arrayidx14.2 = getelementptr inbounds [4 x i16],
>>> [4 x
>>> >> i16]*
>>> >>>>>>>>> %tmpcast,
>>> >>>>>>>>>>>> i32 0, i32 3, !dbg !27
>>> >>>>>>>>>>>> %11 = load i16, i16* %arrayidx14.2, align 2, !dbg
>>> !27,
>>> >> !tbaa
>>> >>>>>>> !17
>>> >>>>>>>>>>>> %cmp16.2 = icmp sgt i16 %spec.select39.1, %11, !dbg
>>> !28
>>> >>>>>>>>>>>> %spec.select39.2 = select i1 %cmp16.2, i16 %11, i16
>>> >>>>>>>>> %spec.select39.1,
>>> >>>>>>>>>>>> !dbg !29
>>> >>>>>>>>>>>> %conv24 = sext i16 %spec.select39.2 to i32, !dbg !30
>>> >>>>>>>>>>>> call void @llvm.lifetime.end.p0i8(i64 8, i8*
>>> nonnull %0)
>>> >> #3,
>>> >>>>>>> !dbg
>>> >>>>>>>>> !31
>>> >>>>>>>>>>>> ret i32 %conv24, !dbg !32
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> On Thu, Jan 14, 2021 at 11:54 AM Roman Lebedev <
>>> >>>>>>> lebedev.ri at gmail.com>
>>> >>>>>>>>>>> wrote:
>>> >>>>>>>>>>>>> It would be good to have an actual IR reproducer here.
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> On Thu, Jan 14, 2021 at 7:51 PM Ryan Taylor via llvm-dev
>>> >>>>>>>>>>>>> <llvm-dev at lists.llvm.org> wrote:
>>> >>>>>>>>>>>>>> So given an intrinsic that has a pointer as in/out and
>>> >>>>>>> IntrWriteMem
>>> >>>>>>>>>>>>> property.
>>> >>>>>>>>>>>>>> call intrinsic(address a, ....);
>>> >>>>>>>>>>>>>> loop over address a
>>> >>>>>>>>>>>>>> load from address a + offset
>>> >>>>>>>>>>>>>> call intrinsic (address a, ...);
>>> >>>>>>>>>>>>>> loop over address a
>>> >>>>>>>>>>>>>> load from address a + offset
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> GVN is removing the second loads, despite the second call
>>> >>>>>>> overwriting
>>> >>>>>>>>>>>>> the memory starting at address a. AA has the intrinsics
>>> marked
>>> >> as
>>> >>>>>>>>>>> unknown
>>> >>>>>>>>>>>>> instructions but has all of these as mayAlias in a set.
>>> I'm not
>>> >>>>>>> seeing
>>> >>>>>>>>>>> this
>>> >>>>>>>>>>>>> issue with -fno-unroll-loops.
>>> >>>>>>>>>>>>>> Thanks.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> _______________________________________________
>>> >>>>>>>>>>>>>> LLVM Developers mailing list
>>> >>>>>>>>>>>>>> llvm-dev at lists.llvm.org
>>> >>>>>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> >>>>>>>>>>>> _______________________________________________
>>> >>>>>>>>>>>> LLVM Developers mailing list
>>> >>>>>>>>>>>> llvm-dev at lists.llvm.org
>>> >>>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
--
Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210218/c517a071/attachment-0001.html>
More information about the llvm-dev
mailing list