[llvm-dev] GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics
Chris Sakalis via llvm-dev
llvm-dev at lists.llvm.org
Wed Aug 31 09:00:13 PDT 2016
Great, thank you!
On Wed, Aug 31, 2016 at 2:07 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
> ------------------------------
>
> *From: *"Chris Sakalis" <chrissakalis at gmail.com>
> *To: *"Daniel Berlin" <dberlin at dberlin.org>
> *Cc: *"Hal Finkel" <hfinkel at anl.gov>, "David Majnemer" <
> david.majnemer at gmail.com>, "llvm-dev" <llvm-dev at lists.llvm.org>, "Philip
> Reames" <listmail at philipreames.com>, "Davide Italiano" <davide at freebsd.org>,
> "Chandler Carruth" <chandlerc at gmail.com>
> *Sent: *Wednesday, August 31, 2016 3:58:56 AM
> *Subject: *Re: [llvm-dev] GVN / Alias Analysis issue with
> llvm.masked.scatter/gather intrinsics
>
> Thank you for the quick fix, I can no longer reproduce the issue. As far a
> releases go, I am guessing that this is going to be in 4.0?
>
> Yes, and we can consider it for 3.9.1 as well.
>
> -Hal
>
>
> Best,
>
> Chris
>
> On Tue, Aug 30, 2016 at 9:26 PM, Daniel Berlin <dberlin at dberlin.org>
> wrote:
>
>> Yeah, i just hope it doesn't regress scatter/gather vector code badly.
>> But at least it's correct now?
>>
>>
>> On Tue, Aug 30, 2016 at 1:11 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>
>>>
>>> ------------------------------
>>>
>>> *From: *"Daniel Berlin" <dberlin at dberlin.org>
>>> *To: *"Philip Reames" <listmail at philipreames.com>, "Davide Italiano" <
>>> davide at freebsd.org>, "Chandler Carruth" <chandlerc at gmail.com>
>>> *Cc: *"Chris Sakalis" <chrissakalis at gmail.com>, "David Majnemer" <
>>> david.majnemer at gmail.com>, "Hal Finkel" <hfinkel at anl.gov>, "llvm-dev" <
>>> llvm-dev at lists.llvm.org>
>>> *Sent: *Tuesday, August 30, 2016 3:07:01 PM
>>> *Subject: *Re: [llvm-dev] GVN / Alias Analysis issue with
>>> llvm.masked.scatter/gather intrinsics
>>>
>>> This is now committed and a test added to GVN.
>>>
>>> Thanks!
>>>
>>> I suspect that, in practice, we'll get little benefit from handling this
>>> until our AA passes learn how to deal with (i.e. look back through) pointer
>>> vectors.
>>>
>>> -Hal
>>>
>>>
>>>
>>> On Mon, Aug 29, 2016 at 1:59 PM, Daniel Berlin <dberlin at dberlin.org>
>>> wrote:
>>>
>>>> Okay, so then it sounds like, for now, the right fix is to stop marking
>>>> masked.gather and masked.scatter with intrarg* options.
>>>>
>>>> On Mon, Aug 29, 2016, 1:26 PM Philip Reames <listmail at philipreames.com>
>>>> wrote:
>>>>
>>>>> We might have specification bug here, but we appear to implement what
>>>>> we specified. argmemonly is specified as only considering pointer typed
>>>>> arguments. It's behavior on vector-of-pointers is unspecified, but would
>>>>> seem to fall into the same case as inttoptr of an integer argument (i.e.
>>>>> explicitly undefined). We could consider changing that.
>>>>>
>>>>>
>>>>>
>>>>> On 08/29/2016 11:59 AM, Daniel Berlin wrote:
>>>>>
>>>>> + a few others.
>>>>>
>>>>> After following this rabbit hole a bit, there are a lot of mutually
>>>>> recursive calls, etc, that may or may not do the right thing with vectors
>>>>> of pointers.
>>>>> I can fix *this* particular bug with the attached patch.
>>>>>
>>>>> However, it's mostly papering over stuff. Nothing seems to know what
>>>>> to do with a memorylocation that is a vector of pointers. They all expect
>>>>> memorylocation to be a single pointer location.
>>>>>
>>>>> I would chalk it up to "luck" that this patch fixes the bug.
>>>>>
>>>>> It's pretty clear that MemoryLocation doesn't fit the needs of a lot
>>>>> of stuff anymore (we hacked AA nodes into it, and lots of stuff now tries
>>>>> to figure out the invariantess of the locations, blah blah blah), but it
>>>>> seems like a big job to figure out what to replace it with that will work
>>>>> for these cases.
>>>>>
>>>>> (I'm pretty positive if we just make it MemoryLocations, and have
>>>>> everything loop over the locations, the compiler will get a lot larger and
>>>>> a lot slower)
>>>>>
>>>>> On Mon, Aug 29, 2016 at 9:10 AM, Daniel Berlin <dberlin at dberlin.org>
>>>>> wrote:
>>>>>
>>>>>> it would also, for that matter, say the same about an array of
>>>>>> pointers.
>>>>>>
>>>>>> It's not clear to me what will break if you change this to
>>>>>> isPtrOrPtrVectorTy.
>>>>>>
>>>>>> In fact, i know it doesn't fix this bug.
>>>>>> It's a pretty deep rabbit hole of things not quite prepared to
>>>>>> understand vectors of pointers.
>>>>>>
>>>>>> (we prepare memorylocations of them, but memory locations expect to
>>>>>> be one thing, not a group of things, etc).
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 29, 2016 at 8:58 AM, Daniel Berlin <dberlin at dberlin.org>
>>>>>> wrote:
>>>>>>
>>>>>>> this is definitely a bug in AA.
>>>>>>>
>>>>>>> 225 for (auto I = CS2.arg_begin(), E = CS2.arg_end(); I !=
>>>>>>> E; ++I) {
>>>>>>> 226 const Value *Arg = *I;
>>>>>>> 227 if (!Arg->getType()->isPointerTy())
>>>>>>> -> 228 continue;
>>>>>>> 229 unsigned CS2ArgIdx = std::distance(CS2.arg_begin(),
>>>>>>> I);
>>>>>>> 230 auto CS2ArgLoc = MemoryLocation::getForArgument(CS2,
>>>>>>> CS2ArgIdx, TLI);
>>>>>>>
>>>>>>> AliasAnalysis.cpp:228
>>>>>>>
>>>>>>> It ignores every argument because they are vectors of pointers, not
>>>>>>> pointers.
>>>>>>>
>>>>>>>
>>>>>>> I'm surprised this has not broken anything before. It will never say
>>>>>>> two intrinsics with vectors of pointers mod/ref each other.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 29, 2016 at 7:36 AM, Davide Italiano <davide at freebsd.org
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> + Daniel Berlin
>>>>>>>>
>>>>>>>> On Mon, Aug 29, 2016 at 3:42 PM, Chris Sakalis via llvm-dev
>>>>>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>>>>> > Hello everyone,
>>>>>>>> >
>>>>>>>> > I think I have found an gvn / alias analysis related bug, but
>>>>>>>> before opening
>>>>>>>> > an issue on the tracker I wanted to see if I am missing
>>>>>>>> something. I have
>>>>>>>> > the following testcase:
>>>>>>>> >
>>>>>>>> >> define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2,
>>>>>>>> i32* %out)
>>>>>>>> >> {
>>>>>>>> >> entry:
>>>>>>>> >> ; Just some temporary storage
>>>>>>>> >> %tmp.0 = alloca i32
>>>>>>>> >> %tmp.1 = alloca i32
>>>>>>>> >> %tmp.i = insertelement <2 x i32*> undef, i32* %tmp.0, i32 0
>>>>>>>> >> %tmp = insertelement <2 x i32*> %tmp.i, i32* %tmp.1, i32 1
>>>>>>>> >> ; Read from in1 and in2
>>>>>>>> >> %in1.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>
>>>>>>>> %in1, i32
>>>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1
>>>>>>>> >> %in2.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>
>>>>>>>> %in2, i32
>>>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1
>>>>>>>> >> ; Store in1 to the allocas
>>>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in1.v, <2 x
>>>>>>>> i32*> %tmp,
>>>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>);
>>>>>>>> >> ; Read in1 from the allocas
>>>>>>>> >> %tmp.v.0 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>
>>>>>>>> %tmp, i32
>>>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1
>>>>>>>> >> ; Store in2 to the allocas
>>>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in2.v, <2 x
>>>>>>>> i32*> %tmp,
>>>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>);
>>>>>>>> >> ; Read in2 from the allocas
>>>>>>>> >> %tmp.v.1 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>
>>>>>>>> %tmp, i32
>>>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1
>>>>>>>> >> ; Store in2 to out for good measure
>>>>>>>> >> %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.1, i32 0
>>>>>>>> >> %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.1, i32 1
>>>>>>>> >> store i32 %tmp.v.1.0, i32* %out
>>>>>>>> >> %out.1 = getelementptr i32, i32* %out, i32 1
>>>>>>>> >> store i32 %tmp.v.1.1, i32* %out.1
>>>>>>>> >> ret void
>>>>>>>> >> }
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > It uses a masked scatter operation to store a value to the two
>>>>>>>> allocas and
>>>>>>>> > then uses a masked gather operation to read that same value. This
>>>>>>>> is done
>>>>>>>> > twice in a row, with two different values. If I run this code
>>>>>>>> through the
>>>>>>>> > GVN pass, the second gather (%tmp.v.1) will be deemed to be the
>>>>>>>> same as the
>>>>>>>> > first gather (%tmp.v.0) and it will be removed. After some
>>>>>>>> debugging, I
>>>>>>>> > realized that this is happening because the Memory Dependence
>>>>>>>> Analysis
>>>>>>>> > returns %tmp.v.0 as the Def dependency for %tmp.v.1, even though
>>>>>>>> the scatter
>>>>>>>> > call in between changes the value stored at %tmp. This, in turn,
>>>>>>>> is
>>>>>>>> > happening because the alias analysis is returning NoModRef for
>>>>>>>> the %tmp.v.1
>>>>>>>> > and second scatter callsites. The resulting IR produces the wrong
>>>>>>>> result:
>>>>>>>> >
>>>>>>>> >> define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2,
>>>>>>>> i32* %out)
>>>>>>>> >> {
>>>>>>>> >> entry:
>>>>>>>> >> %tmp.0 = alloca i32
>>>>>>>> >> %tmp.1 = alloca i32
>>>>>>>> >> %tmp.i = insertelement <2 x i32*> undef, i32* %tmp.0, i32 0
>>>>>>>> >> %tmp = insertelement <2 x i32*> %tmp.i, i32* %tmp.1, i32 1
>>>>>>>> >> %in1.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>
>>>>>>>> %in1, i32
>>>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1
>>>>>>>> >> %in2.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>
>>>>>>>> %in2, i32
>>>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1
>>>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in1.v, <2 x
>>>>>>>> i32*> %tmp,
>>>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>)
>>>>>>>> >> %tmp.v.0 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>
>>>>>>>> %tmp, i32
>>>>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1
>>>>>>>> >> call void @llvm.masked.scatter.v2i32(<2 x i32> %in2.v, <2 x
>>>>>>>> i32*> %tmp,
>>>>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>)
>>>>>>>> >> ; The call to masked.gather is gone and now we are using the
>>>>>>>> old value
>>>>>>>> >> (%tmp.v.0)
>>>>>>>> >> %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.0, i32 0
>>>>>>>> >> %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.0, i32 1
>>>>>>>> >> store i32 %tmp.v.1.0, i32* %out
>>>>>>>> >> %out.1 = getelementptr i32, i32* %out, i32 1
>>>>>>>> >> store i32 %tmp.v.1.1, i32* %out.1
>>>>>>>> >> ret void
>>>>>>>> >> }
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > The old value read from %tmp is used, instead of the new one. I
>>>>>>>> tested this
>>>>>>>> > code using `opt -gvn`, with LLVM 3.8.1. I also tried tip
>>>>>>>> (84cb7f4) with the
>>>>>>>> > same result.
>>>>>>>> >
>>>>>>>> > I should mention that if I replace the second scatter with
>>>>>>>> stores, the issue
>>>>>>>> > persists. The only way to avoid it is to replace all
>>>>>>>> scatter/gather
>>>>>>>> > intrinsics with equivalent sets of store/load, in which case the
>>>>>>>> MemDep
>>>>>>>> > returns the correct dependencies and the GVN pass will not remove
>>>>>>>> the second
>>>>>>>> > set of loads.
>>>>>>>> >
>>>>>>>> > So, my question is, is this a bug or am I doing something that I
>>>>>>>> shouldn't
>>>>>>>> > be in the IR? And if it is a bug, is it the AA analyses that
>>>>>>>> return the
>>>>>>>> > wrong result (I presume so) or should GVN handle such cases
>>>>>>>> differently?
>>>>>>>> >
>>>>>>>> > Best regards,
>>>>>>>> >
>>>>>>>> > Chris
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > LLVM Developers mailing list
>>>>>>>> > llvm-dev at lists.llvm.org
>>>>>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Davide
>>>>>>>>
>>>>>>>> "There are no solved problems; there are only problems that are more
>>>>>>>> or less solved" -- Henri Poincare
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>> --
>>> Hal Finkel
>>> Assistant Computational Scientist
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>>
>>
>>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160831/aff3aa13/attachment.html>
More information about the llvm-dev
mailing list