[llvm-dev] GVN removing loads that are affected by call
Johannes Doerfert via llvm-dev
llvm-dev at lists.llvm.org
Mon Jan 18 15:13:38 PST 2021
Your example/trouble starts with an "identified function local" (for
short, IFL) pointer
(see llvm::isIdentifiedFunctionLocal), i.a., as it fallsĀ out of
`malloc`, `alloca`, or
a `__restrict` qualified argument.
Since no access on a pointer not derived from the IFL one can cause the
underlying memory to be altered,
we can ignore other accesses *iff* we can show they don't use a pointer
that is "based-on" the IFL one.
The easiest way to prove it is not based on the IFL one is to show the
IFL pointer does not escape since
then every based-on use can be identified by def-use chains. A call
argument that is nocapture guarantees
no copy of the pointer is made that outlives the call. If you return the
pointer, a copy is made and you
broke the contract. We consequently "ignore" accesses through that
pointer when we argue about the IFL one
as they cannot alias.
Since we often return a pointer argument, the Attributor uses
"no-capture-maybe-returned"
which we recently discussed to propose as a new enum attribute. You
could then mark the argument as such
and capture + use traversals could continue with the intrinsic value to
determine all accesses to the IFL.
I hope this makes it somewhat clearer.
WRT. memcpy, it does not copy the pointers you put it but the memory
they point to. The address of the memory
did not escape/was not captured.
~ Johannes
On 1/18/21 4:55 PM, Ryan Taylor wrote:
> I'm still a bit fuzzy on the definition but that doesn't seem to completely
> agree with pointer capture?
>
> " A pointer value *is captured* if the function makes a copy of any part of
> the pointer that outlives the call. "
>
> I can replace memcpy in our example and see no issues. memcpy is no
> capture.
>
> #define SIZE 5
> #include<string>
> int foo(short a[SIZE], short b[SIZE], short *res) {
> short temp[SIZE] = {0, 0, 0, 0, 0};
> short *ptr;
> ptr = temp;
> memcpy(ptr, b, SIZE*2);
> short t = temp[0];
> for (int i = 1; i < 4; i++) {
> if (t > temp[i]) t = temp[i];
> }
> *res = t;
> memcpy(ptr, b, SIZE*2);
> t = temp[0];
> for (int i = 1; i < 4; i++) {
> if (t > temp[i]) t = temp[i];
> }
> return t;
> }
>
> So you're saying that if the memcpy intrinsic, in this case, returned a
> pointer (even if not directly used, ie "ptr" in this case) and had
> NoCapture attribute, the second set of loads would be removed?
>
> -Ryan
>
> On Mon, Jan 18, 2021 at 4:30 PM Johannes Doerfert <
> johannesdoerfert at gmail.com> wrote:
>
>> A function that returns a pointer argument "captures" it for now.
>>
>> The Attributor uses "nocapute_maybe_returned" internally but it is not
>> an enum attribute (yet).
>>
>>
>> On 1/15/21 7:32 PM, Ryan Taylor wrote:
>>> So this seems to be an issue with the intrinsic setting NoCapture<0> (it
>>> returns the pointer it takes). We have a similar intrinsic that does the
>>> exact same thing but doesn't return the pointer. It also has NoCapture<1>
>>> and doesn't cause this issue. Also, I managed to substitute memcpy and
>>> there were no issues, it also has nocapture.
>>>
>>> Are there some papers/citations which talk more about Pointer
>>> Capture/Capture Tracking?
>>>
>>> I saw this blog:
>>> https://jonasdevlieghere.com/escape-analysis-capture-tracking-in-llvm/
>>> And I read the CaptureTracking.h short description but I'd like a clearer
>>> grasp.
>>>
>>> Thanks,
>>>
>>> Ryan
>>>
>>>
>>>
>>> On Thu, Jan 14, 2021 at 2:03 PM Ryan Taylor <ryta1203 at gmail.com> wrote:
>>>
>>>> Ok, thanks. I'll try to come up with a more generic test case.
>>>>
>>>> On Thu, Jan 14, 2021 at 2:00 PM Johannes Doerfert <
>>>> johannesdoerfert at gmail.com> wrote:
>>>>
>>>>> Nothing comes to mind given the information.
>>>>>
>>>>> ~ Johannes
>>>>>
>>>>>
>>>>> On 1/14/21 12:56 PM, Ryan Taylor wrote:
>>>>>> argmemonly, nounwind, writeonly
>>>>>>
>>>>>> -fno-strict-aliasing did not help.
>>>>>>
>>>>>> On Thu, Jan 14, 2021 at 1:29 PM Johannes Doerfert <
>>>>>> johannesdoerfert at gmail.com> wrote:
>>>>>>
>>>>>>> Can you share the attributes of the intrinsic declaration,
>>>>>>> assuming removing `!tbaa` didn't help.
>>>>>>>
>>>>>>>
>>>>>>> On 1/14/21 11:43 AM, Ryan Taylor wrote:
>>>>>>>> Yes, this is for downstream/out of tree target so I'm not sure how
>> you
>>>>>>>> could reproduce it either but I thought the IR might help a bit.
>>>>>>>>
>>>>>>>> My guess is it's not a bug in GVN as much as an issue with the
>>>>> intrinsic
>>>>>>>> properties, or lack thereof. I put this in the first post but the
>>>>> Alias
>>>>>>>> Sets show:
>>>>>>>>
>>>>>>>> AliasSet[0x75fbd0, 9] may alias, Mod/Ref Pointers: (i8* %1, 8),
>>>>> (i16*
>>>>>>>> %arrayidx, 2), (i16* %arrayidx1, 2), (i16* %arrayidx5, 2), (i16*
>> %19,
>>>>> 2),
>>>>>>>> (i16* %arrayidx6, 2), (i16* %arrayidx14, 2), (i16* %arrayidx19, 2)
>>>>>>>> 4 Unknown instructions: void <badref>, <4 x i16>* %6, <4 x
>> i16>*
>>>>>>> %22,
>>>>>>>> void <badref>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jan 14, 2021 at 12:37 PM Johannes Doerfert <
>>>>>>>> johannesdoerfert at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> There is still not enough information here.
>>>>>>>>>
>>>>>>>>> My first guess. The `!tbaa` annotation on the `XXX.intrinsic` and
>> the
>>>>>>>>> `load`
>>>>>>>>> basically encode there is no alias. Easy to verify, remove the ones
>>>>> on
>>>>>>> the
>>>>>>>>> intrinsic.
>>>>>>>>>
>>>>>>>>> ~ Johannes
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> P.S. If this was a bug in GVN, and I assume it is not, a reproducer
>>>>>>>>> would help
>>>>>>>>> a lot. So a small IR sample that shows the problem and
>> which
>>>>> we
>>>>>>>>> can run.
>>>>>>>>> This is a "redacted?" IR fragment in which I don't know
>> what
>>>>>>>>> transformation
>>>>>>>>> is problematic. I also can not run it through GVN, which
>>>>> makes it
>>>>>>>>> impossible
>>>>>>>>> to reproduce.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 1/14/21 11:27 AM, Ryan Taylor via llvm-dev wrote:
>>>>>>>>>> This is right before GVN:
>>>>>>>>>>
>>>>>>>>>> define i32 @foo(<4 x i16> %p, <4 x i16> %p1, i16* nocapture %res)
>>>>>>>>>> local_unnamed_addr #0 !dbg !6 {
>>>>>>>>>> entry:
>>>>>>>>>> %temp = alloca i64, align 8
>>>>>>>>>> %tmpcast = bitcast i64* %temp to [4 x i16]*
>>>>>>>>>> %0 = bitcast i64* %temp to i8*, !dbg !8
>>>>>>>>>> call void @llvm.lifetime.start.p0i8(i64 8, i8* nonnull %0)
>> #3,
>>>>>>> !dbg !8
>>>>>>>>>> store i64 0, i64* %temp, align 8, !dbg !9
>>>>>>>>>> %1 = bitcast i64* %temp to <4 x i16>*, !dbg !10
>>>>>>>>>> %2 = call <4 x i16>* @llvm.XXX.intrinsic(<4 x i16>* nonnull
>>>>> %1, <4
>>>>>>> x
>>>>>>>>> i16>
>>>>>>>>>> %p, i32 0), !dbg !11, !tbaa !12
>>>>>>>>>> %arrayidx = bitcast i64* %temp to i16*, !dbg !16
>>>>>>>>>> %3 = load i16, i16* %arrayidx, align 8, !dbg !16, !tbaa !17
>>>>>>>>>> br label %for.body, !dbg !19
>>>>>>>>>>
>>>>>>>>>> for.body: ; preds = %entry
>>>>>>>>>> %arrayidx1 = getelementptr inbounds [4 x i16], [4 x i16]*
>>>>> %tmpcast,
>>>>>>>>> i32
>>>>>>>>>> 0, i32 1, !dbg !20
>>>>>>>>>> %4 = load i16, i16* %arrayidx1, align 2, !dbg !20, !tbaa !17
>>>>>>>>>> %cmp3 = icmp sgt i16 %3, %4, !dbg !21
>>>>>>>>>> %spec.select = select i1 %cmp3, i16 %4, i16 %3, !dbg !22
>>>>>>>>>> %arrayidx1.1 = getelementptr inbounds [4 x i16], [4 x i16]*
>>>>>>> %tmpcast,
>>>>>>>>> i32
>>>>>>>>>> 0, i32 2, !dbg !20
>>>>>>>>>> %5 = load i16, i16* %arrayidx1.1, align 2, !dbg !20, !tbaa
>> !17
>>>>>>>>>> %cmp3.1 = icmp sgt i16 %spec.select, %5, !dbg !21
>>>>>>>>>> %spec.select.1 = select i1 %cmp3.1, i16 %5, i16
>> %spec.select,
>>>>> !dbg
>>>>>>> !22
>>>>>>>>>> %arrayidx1.2 = getelementptr inbounds [4 x i16], [4 x i16]*
>>>>>>> %tmpcast,
>>>>>>>>> i32
>>>>>>>>>> 0, i32 3, !dbg !20
>>>>>>>>>> %6 = load i16, i16* %arrayidx1.2, align 2, !dbg !20, !tbaa
>> !17
>>>>>>>>>> %cmp3.2 = icmp sgt i16 %spec.select.1, %6, !dbg !21
>>>>>>>>>> %spec.select.2 = select i1 %cmp3.2, i16 %6, i16
>> %spec.select.1,
>>>>>>> !dbg
>>>>>>>>> !22
>>>>>>>>>> store i16 %spec.select.2, i16* %res, align 2, !dbg !23,
>> !tbaa
>>>>> !17
>>>>>>>>>> %7 = tail call <4 x i16>* @llvm.XXX.intrinsic(<4 x i16>* %2,
>>>>> <4 x
>>>>>>> i16>
>>>>>>>>>> %p1, i32 0), !dbg !24, !tbaa !12
>>>>>>>>>> %8 = load i16, i16* %arrayidx, align 8, !dbg !25, !tbaa !17
>>>>>>>>>> br label %for.body12, !dbg !26
>>>>>>>>>>
>>>>>>>>>> for.body12: ; preds =
>>>>> %for.body
>>>>>>>>>> %arrayidx14 = getelementptr inbounds [4 x i16], [4 x i16]*
>>>>>>> %tmpcast,
>>>>>>>>> i32
>>>>>>>>>> 0, i32 1, !dbg !27
>>>>>>>>>> %9 = load i16, i16* %arrayidx14, align 2, !dbg !27, !tbaa
>> !17
>>>>>>>>>> %cmp16 = icmp sgt i16 %8, %9, !dbg !28
>>>>>>>>>> %spec.select39 = select i1 %cmp16, i16 %9, i16 %8, !dbg !29
>>>>>>>>>> %arrayidx14.1 = getelementptr inbounds [4 x i16], [4 x i16]*
>>>>>>> %tmpcast,
>>>>>>>>>> i32 0, i32 2, !dbg !27
>>>>>>>>>> %10 = load i16, i16* %arrayidx14.1, align 2, !dbg !27, !tbaa
>>>>> !17
>>>>>>>>>> %cmp16.1 = icmp sgt i16 %spec.select39, %10, !dbg !28
>>>>>>>>>> %spec.select39.1 = select i1 %cmp16.1, i16 %10, i16
>>>>> %spec.select39,
>>>>>>>>> !dbg
>>>>>>>>>> !29
>>>>>>>>>> %arrayidx14.2 = getelementptr inbounds [4 x i16], [4 x i16]*
>>>>>>> %tmpcast,
>>>>>>>>>> i32 0, i32 3, !dbg !27
>>>>>>>>>> %11 = load i16, i16* %arrayidx14.2, align 2, !dbg !27, !tbaa
>>>>> !17
>>>>>>>>>> %cmp16.2 = icmp sgt i16 %spec.select39.1, %11, !dbg !28
>>>>>>>>>> %spec.select39.2 = select i1 %cmp16.2, i16 %11, i16
>>>>>>> %spec.select39.1,
>>>>>>>>>> !dbg !29
>>>>>>>>>> %conv24 = sext i16 %spec.select39.2 to i32, !dbg !30
>>>>>>>>>> call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %0) #3,
>>>>> !dbg
>>>>>>> !31
>>>>>>>>>> ret i32 %conv24, !dbg !32
>>>>>>>>>>
>>>>>>>>>> On Thu, Jan 14, 2021 at 11:54 AM Roman Lebedev <
>>>>> lebedev.ri at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>> It would be good to have an actual IR reproducer here.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 14, 2021 at 7:51 PM Ryan Taylor via llvm-dev
>>>>>>>>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>>>>>>>>> So given an intrinsic that has a pointer as in/out and
>>>>> IntrWriteMem
>>>>>>>>>>> property.
>>>>>>>>>>>> call intrinsic(address a, ....);
>>>>>>>>>>>> loop over address a
>>>>>>>>>>>> load from address a + offset
>>>>>>>>>>>> call intrinsic (address a, ...);
>>>>>>>>>>>> loop over address a
>>>>>>>>>>>> load from address a + offset
>>>>>>>>>>>>
>>>>>>>>>>>> GVN is removing the second loads, despite the second call
>>>>> overwriting
>>>>>>>>>>> the memory starting at address a. AA has the intrinsics marked as
>>>>>>>>> unknown
>>>>>>>>>>> instructions but has all of these as mayAlias in a set. I'm not
>>>>> seeing
>>>>>>>>> this
>>>>>>>>>>> issue with -fno-unroll-loops.
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>> _______________________________________________
>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
More information about the llvm-dev
mailing list