[llvm-dev] GVN / Alias Analysis issue with llvm.masked.scatter/gather intrinsics

Daniel Berlin via llvm-dev llvm-dev at lists.llvm.org
Tue Aug 30 13:07:01 PDT 2016


This is now committed and a test added to GVN.



On Mon, Aug 29, 2016 at 1:59 PM, Daniel Berlin <dberlin at dberlin.org> wrote:

> Okay, so then it sounds like, for now, the right fix is to stop marking
> masked.gather and masked.scatter with intrarg* options.
>
> On Mon, Aug 29, 2016, 1:26 PM Philip Reames <listmail at philipreames.com>
> wrote:
>
>> We might have specification bug here, but we appear to implement what we
>> specified.  argmemonly is specified as only considering pointer typed
>> arguments.  It's behavior on vector-of-pointers is unspecified, but would
>> seem to fall into the same case as inttoptr of an integer argument (i.e.
>> explicitly undefined).  We could consider changing that.
>>
>>
>>
>> On 08/29/2016 11:59 AM, Daniel Berlin wrote:
>>
>> + a few others.
>>
>> After following this rabbit hole a bit, there are a lot of mutually
>> recursive calls, etc, that may or may not do the right thing with vectors
>> of pointers.
>> I can fix *this* particular bug with the attached patch.
>>
>> However, it's mostly papering over stuff.  Nothing seems to know what to
>> do with a memorylocation that is a vector of pointers. They all expect
>> memorylocation to be a single pointer location.
>>
>> I would chalk it up to "luck" that this patch fixes the bug.
>>
>> It's pretty clear that MemoryLocation doesn't fit the needs of a lot of
>> stuff anymore (we hacked AA nodes into it, and lots of stuff now tries to
>> figure out the invariantess of the locations, blah blah blah), but it seems
>> like a big job to figure out what to replace it with that will work for
>> these cases.
>>
>> (I'm pretty positive if we just make it MemoryLocations, and have
>> everything loop over the locations, the compiler will get a lot larger and
>> a lot slower)
>>
>> On Mon, Aug 29, 2016 at 9:10 AM, Daniel Berlin <dberlin at dberlin.org>
>> wrote:
>>
>>> it would also, for that matter, say the same about an array of pointers.
>>>
>>> It's not clear to me what will break if you change this to
>>> isPtrOrPtrVectorTy.
>>>
>>> In fact, i know it doesn't fix this bug.
>>> It's a pretty deep rabbit hole of things not quite prepared to
>>> understand vectors of pointers.
>>>
>>> (we prepare memorylocations of them, but memory locations expect to be
>>> one thing, not a group of things, etc).
>>>
>>>
>>>
>>>
>>> On Mon, Aug 29, 2016 at 8:58 AM, Daniel Berlin <dberlin at dberlin.org>
>>> wrote:
>>>
>>>> this is definitely a bug in AA.
>>>>
>>>>   225      for (auto I = CS2.arg_begin(), E = CS2.arg_end(); I != E;
>>>> ++I) {
>>>>    226        const Value *Arg = *I;
>>>>    227        if (!Arg->getType()->isPointerTy())
>>>> -> 228          continue;
>>>>    229        unsigned CS2ArgIdx = std::distance(CS2.arg_begin(), I);
>>>>    230        auto CS2ArgLoc = MemoryLocation::getForArgument(CS2,
>>>> CS2ArgIdx, TLI);
>>>>
>>>>  AliasAnalysis.cpp:228
>>>>
>>>> It ignores every argument because they are vectors of pointers, not
>>>> pointers.
>>>>
>>>>
>>>> I'm surprised this has not broken anything before. It will never say
>>>> two intrinsics with vectors of pointers mod/ref each other.
>>>>
>>>>
>>>> On Mon, Aug 29, 2016 at 7:36 AM, Davide Italiano <davide at freebsd.org>
>>>> wrote:
>>>>
>>>>> + Daniel Berlin
>>>>>
>>>>> On Mon, Aug 29, 2016 at 3:42 PM, Chris Sakalis via llvm-dev
>>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>> > Hello everyone,
>>>>> >
>>>>> > I think I have found an gvn / alias analysis related bug, but before
>>>>> opening
>>>>> > an issue on the tracker I wanted to see if I am missing something. I
>>>>> have
>>>>> > the following testcase:
>>>>> >
>>>>> >> define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2,
>>>>> i32* %out)
>>>>> >> {
>>>>> >> entry:
>>>>> >>   ; Just some temporary storage
>>>>> >>   %tmp.0 = alloca i32
>>>>> >>   %tmp.1 = alloca i32
>>>>> >>   %tmp.i = insertelement <2 x i32*> undef, i32* %tmp.0, i32 0
>>>>> >>   %tmp = insertelement <2 x i32*> %tmp.i, i32* %tmp.1, i32 1
>>>>> >>   ; Read from in1 and in2
>>>>> >>   %in1.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>
>>>>> %in1, i32
>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1
>>>>> >>   %in2.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>
>>>>> %in2, i32
>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1
>>>>> >>   ; Store in1 to the allocas
>>>>> >>   call void @llvm.masked.scatter.v2i32(<2 x i32> %in1.v, <2 x i32*>
>>>>> %tmp,
>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>);
>>>>> >>   ; Read in1 from the allocas
>>>>> >>   %tmp.v.0 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>
>>>>> %tmp, i32
>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1
>>>>> >>   ; Store in2 to the allocas
>>>>> >>   call void @llvm.masked.scatter.v2i32(<2 x i32> %in2.v, <2 x i32*>
>>>>> %tmp,
>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>);
>>>>> >>   ; Read in2 from the allocas
>>>>> >>   %tmp.v.1 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>
>>>>> %tmp, i32
>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1
>>>>> >>   ; Store in2 to out for good measure
>>>>> >>   %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.1, i32 0
>>>>> >>   %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.1, i32 1
>>>>> >>   store i32 %tmp.v.1.0, i32* %out
>>>>> >>   %out.1 = getelementptr i32, i32* %out, i32 1
>>>>> >>   store i32 %tmp.v.1.1, i32* %out.1
>>>>> >>   ret void
>>>>> >> }
>>>>> >
>>>>> >
>>>>> > It uses a masked scatter operation to store a value to the two
>>>>> allocas and
>>>>> > then uses a masked gather operation to read that same value. This is
>>>>> done
>>>>> > twice in a row, with two different values. If I run this code
>>>>> through the
>>>>> > GVN pass, the second gather (%tmp.v.1) will be deemed to be the same
>>>>> as the
>>>>> > first gather (%tmp.v.0) and it will be removed. After some
>>>>> debugging, I
>>>>> > realized that this is happening because the Memory Dependence
>>>>> Analysis
>>>>> > returns %tmp.v.0 as the Def dependency for %tmp.v.1, even though the
>>>>> scatter
>>>>> > call in between changes the value stored at %tmp. This, in turn, is
>>>>> > happening because the alias analysis is returning NoModRef for the
>>>>> %tmp.v.1
>>>>> > and second scatter callsites. The resulting IR produces the wrong
>>>>> result:
>>>>> >
>>>>> >> define spir_kernel void @test(<2 x i32*> %in1, <2 x i32*> %in2,
>>>>> i32* %out)
>>>>> >> {
>>>>> >> entry:
>>>>> >>   %tmp.0 = alloca i32
>>>>> >>   %tmp.1 = alloca i32
>>>>> >>   %tmp.i = insertelement <2 x i32*> undef, i32* %tmp.0, i32 0
>>>>> >>   %tmp = insertelement <2 x i32*> %tmp.i, i32* %tmp.1, i32 1
>>>>> >>   %in1.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>
>>>>> %in1, i32
>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1
>>>>> >>   %in2.v = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>
>>>>> %in2, i32
>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1
>>>>> >>   call void @llvm.masked.scatter.v2i32(<2 x i32> %in1.v, <2 x i32*>
>>>>> %tmp,
>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>)
>>>>> >>   %tmp.v.0 = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>
>>>>> %tmp, i32
>>>>> >> 1, <2 x i1> <i1 true, i1 true>, <2 x i32> undef) #1
>>>>> >>   call void @llvm.masked.scatter.v2i32(<2 x i32> %in2.v, <2 x i32*>
>>>>> %tmp,
>>>>> >> i32 1, <2 x i1> <i1 true, i1 true>)
>>>>> >>   ; The call to masked.gather is gone and now we are using the old
>>>>> value
>>>>> >> (%tmp.v.0)
>>>>> >>   %tmp.v.1.0 = extractelement <2 x i32> %tmp.v.0, i32 0
>>>>> >>   %tmp.v.1.1 = extractelement <2 x i32> %tmp.v.0, i32 1
>>>>> >>   store i32 %tmp.v.1.0, i32* %out
>>>>> >>   %out.1 = getelementptr i32, i32* %out, i32 1
>>>>> >>   store i32 %tmp.v.1.1, i32* %out.1
>>>>> >>   ret void
>>>>> >> }
>>>>> >
>>>>> >
>>>>> > The old value read from %tmp is used, instead of the new one. I
>>>>> tested this
>>>>> > code using `opt -gvn`, with LLVM 3.8.1. I also tried tip (84cb7f4)
>>>>> with the
>>>>> > same result.
>>>>> >
>>>>> > I should mention that if I replace the second scatter with stores,
>>>>> the issue
>>>>> > persists. The only way to avoid it is to replace all scatter/gather
>>>>> > intrinsics with equivalent sets of store/load, in which case the
>>>>> MemDep
>>>>> > returns the correct dependencies and the GVN pass will not remove
>>>>> the second
>>>>> > set of loads.
>>>>> >
>>>>> > So, my question is, is this a bug or am I doing something that I
>>>>> shouldn't
>>>>> > be in the IR? And if it is a bug, is it the AA analyses that return
>>>>> the
>>>>> > wrong result (I presume so) or should GVN handle such cases
>>>>> differently?
>>>>> >
>>>>> > Best regards,
>>>>> >
>>>>> > Chris
>>>>> >
>>>>> > _______________________________________________
>>>>> > LLVM Developers mailing list
>>>>> > llvm-dev at lists.llvm.org
>>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Davide
>>>>>
>>>>> "There are no solved problems; there are only problems that are more
>>>>> or less solved" -- Henri Poincare
>>>>>
>>>>
>>>>
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160830/d5557e8b/attachment.html>


More information about the llvm-dev mailing list