<div dir="ltr">Okay, so then it sounds like, for now, the right fix is to stop marking masked.gather and masked.scatter with intrarg* options.<div><br><div class="gmail_quote"><div dir="ltr">On Mon, Aug 29, 2016, 1:26 PM Philip Reames <<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    <p>We might have specification bug here, but we appear to implement
      what we specified.  argmemonly is specified as only considering
      pointer typed arguments.  It's behavior on vector-of-pointers is
      unspecified, but would seem to fall into the same case as inttoptr
      of an integer argument (i.e. explicitly undefined).  We could
      consider changing that.<br>
    </p></div><div bgcolor="#FFFFFF" text="#000000">
    <p><br>
    </p>
    <br>
    <div>On 08/29/2016 11:59 AM, Daniel Berlin
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">+ a few others.
        <div><br>
        </div>
        <div>After following this rabbit hole a bit, there are a lot of
          mutually recursive calls, etc, that may or may not do the
          right thing with vectors of pointers.</div>
        <div>I can fix *this* particular bug with the attached patch.</div>
        <div><br>
        </div>
        <div>However, it's mostly papering over stuff.  Nothing seems to
          know what to do with a memorylocation that is a vector of
          pointers. They all expect memorylocation to be a single
          pointer location.</div>
        <div><br>
          I would chalk it up to "luck" that this patch fixes the bug.</div>
        <div><br>
        </div>
        <div>It's pretty clear that MemoryLocation doesn't fit the needs
          of a lot of stuff anymore (we hacked AA nodes into it, and
          lots of stuff now tries to figure out the invariantess of the
          locations, blah blah blah), but it seems like a big job to
          figure out what to replace it with that will work for these
          cases.</div>
        <div><br>
        </div>
        <div>(I'm pretty positive if we just make it MemoryLocations,
          and have everything loop over the locations, the compiler will
          get a lot larger and a lot slower)</div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Mon, Aug 29, 2016 at 9:10 AM, Daniel
          Berlin <span dir="ltr"><<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">it would also, for that matter, say the same
              about an array of pointers.
              <div><br>
              </div>
              <div>It's not clear to me what will break if you change
                this to isPtrOrPtrVectorTy.</div>
              <div><br>
              </div>
              <div>In fact, i know it doesn't fix this bug.</div>
              <div>It's a pretty deep rabbit hole of things not quite
                prepared to understand vectors of pointers.</div>
              <div><br>
              </div>
              <div>(we prepare memorylocations of them, but memory
                locations expect to be one thing, not a group of things,
                etc).</div>
              <div><br>
              </div>
              <div><br>
              </div>
              <div><br>
              </div>
            </div>
            <div>
              <div>
                <div class="gmail_extra"><br>
                  <div class="gmail_quote">On Mon, Aug 29, 2016 at 8:58
                    AM, Daniel Berlin <span dir="ltr"><<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>></span>
                    wrote:<br>
                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div dir="ltr">this is definitely a bug in AA.
                        <div><br>
                        </div>
                        <div>
                          <div>  225 <span style="white-space:pre-wrap"> </span>
                                 for (auto I = CS2.arg_begin(), E =
                            CS2.arg_end(); I != E; ++I) {</div>
                          <div>   226 <span style="white-space:pre-wrap">       </span>
                                   const Value *Arg = *I;</div>
                          <div>   227 <span style="white-space:pre-wrap">       </span>
                                   if
                            (!Arg->getType()->isPointerTy(<wbr>))</div>
                          <div>-> 228 <span style="white-space:pre-wrap">      </span>
                                     continue;</div>
                          <div>   229 <span style="white-space:pre-wrap">       </span>
                                   unsigned CS2ArgIdx =
                            std::distance(CS2.arg_begin(), I);</div>
                          <div>   230 <span style="white-space:pre-wrap">       </span>
                                   auto CS2ArgLoc =
                            MemoryLocation::<wbr>getForArgument(CS2,
                            CS2ArgIdx, TLI);</div>
                        </div>
                        <div><br>
                        </div>
                        <div> AliasAnalysis.cpp:228<br>
                        </div>
                        <div><br>
                        </div>
                        <div>It ignores every argument because they are
                          vectors of pointers, not pointers.</div>
                        <div><br>
                        </div>
                        <div><br>
                        </div>
                        <div>I'm surprised this has not broken anything
                          before. It will never say two intrinsics with
                          vectors of pointers mod/ref each other.</div>
                        <div><br>
                        </div>
                      </div>
                      <div>
                        <div>
                          <div class="gmail_extra"><br>
                            <div class="gmail_quote">On Mon, Aug 29,
                              2016 at 7:36 AM, Davide Italiano <span dir="ltr"><<a href="mailto:davide@freebsd.org" target="_blank">davide@freebsd.org</a>></span>
                              wrote:<br>
                              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">+ Daniel
                                Berlin<br>
                                <div>
                                  <div><br>
                                    On Mon, Aug 29, 2016 at 3:42 PM,
                                    Chris Sakalis via llvm-dev<br>
                                    <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
                                    wrote:<br>
                                    > Hello everyone,<br>
                                    ><br>
                                    > I think I have found an gvn /
                                    alias analysis related bug, but
                                    before opening<br>
                                    > an issue on the tracker I
                                    wanted to see if I am missing
                                    something. I have<br>
                                    > the following testcase:<br>
                                    ><br>
                                    >> define spir_kernel void
                                    @test(<2 x i32*> %in1, <2 x
                                    i32*> %in2, i32* %out)<br>
                                    >> {<br>
                                    >> entry:<br>
                                    >>   ; Just some temporary
                                    storage<br>
                                    >>   %tmp.0 = alloca i32<br>
                                    >>   %tmp.1 = alloca i32<br>
                                    >>   %tmp.i = insertelement
                                    <2 x i32*> undef, i32* %tmp.0,
                                    i32 0<br>
                                    >>   %tmp = insertelement
                                    <2 x i32*> %tmp.i, i32*
                                    %tmp.1, i32 1<br>
                                    >>   ; Read from in1 and in2<br>
                                    >>   %in1.v = call <2 x
                                    i32>
                                    @llvm.masked.gather.v2i32(<2 x
                                    i32*> %in1, i32<br>
                                    >> 1, <2 x i1> <i1
                                    true, i1 true>, <2 x i32>
                                    undef) #1<br>
                                    >>   %in2.v = call <2 x
                                    i32>
                                    @llvm.masked.gather.v2i32(<2 x
                                    i32*> %in2, i32<br>
                                    >> 1, <2 x i1> <i1
                                    true, i1 true>, <2 x i32>
                                    undef) #1<br>
                                    >>   ; Store in1 to the
                                    allocas<br>
                                    >>   call void
                                    @llvm.masked.scatter.v2i32(<2 x
                                    i32> %in1.v, <2 x i32*>
                                    %tmp,<br>
                                    >> i32 1, <2 x i1>
                                    <i1 true, i1 true>);<br>
                                    >>   ; Read in1 from the
                                    allocas<br>
                                    >>   %tmp.v.0 = call <2 x
                                    i32>
                                    @llvm.masked.gather.v2i32(<2 x
                                    i32*> %tmp, i32<br>
                                    >> 1, <2 x i1> <i1
                                    true, i1 true>, <2 x i32>
                                    undef) #1<br>
                                    >>   ; Store in2 to the
                                    allocas<br>
                                    >>   call void
                                    @llvm.masked.scatter.v2i32(<2 x
                                    i32> %in2.v, <2 x i32*>
                                    %tmp,<br>
                                    >> i32 1, <2 x i1>
                                    <i1 true, i1 true>);<br>
                                    >>   ; Read in2 from the
                                    allocas<br>
                                    >>   %tmp.v.1 = call <2 x
                                    i32>
                                    @llvm.masked.gather.v2i32(<2 x
                                    i32*> %tmp, i32<br>
                                    >> 1, <2 x i1> <i1
                                    true, i1 true>, <2 x i32>
                                    undef) #1<br>
                                    >>   ; Store in2 to out for
                                    good measure<br>
                                    >>   %tmp.v.1.0 =
                                    extractelement <2 x i32>
                                    %tmp.v.1, i32 0<br>
                                    >>   %tmp.v.1.1 =
                                    extractelement <2 x i32>
                                    %tmp.v.1, i32 1<br>
                                    >>   store i32 %tmp.v.1.0,
                                    i32* %out<br>
                                    >>   %out.1 = getelementptr
                                    i32, i32* %out, i32 1<br>
                                    >>   store i32 %tmp.v.1.1,
                                    i32* %out.1<br>
                                    >>   ret void<br>
                                    >> }<br>
                                    ><br>
                                    ><br>
                                    > It uses a masked scatter
                                    operation to store a value to the
                                    two allocas and<br>
                                    > then uses a masked gather
                                    operation to read that same value.
                                    This is done<br>
                                    > twice in a row, with two
                                    different values. If I run this code
                                    through the<br>
                                    > GVN pass, the second gather
                                    (%tmp.v.1) will be deemed to be the
                                    same as the<br>
                                    > first gather (%tmp.v.0) and it
                                    will be removed. After some
                                    debugging, I<br>
                                    > realized that this is happening
                                    because the Memory Dependence
                                    Analysis<br>
                                    > returns %tmp.v.0 as the Def
                                    dependency for %tmp.v.1, even though
                                    the scatter<br>
                                    > call in between changes the
                                    value stored at %tmp. This, in turn,
                                    is<br>
                                    > happening because the alias
                                    analysis is returning NoModRef for
                                    the %tmp.v.1<br>
                                    > and second scatter callsites.
                                    The resulting IR produces the wrong
                                    result:<br>
                                    ><br>
                                    >> define spir_kernel void
                                    @test(<2 x i32*> %in1, <2 x
                                    i32*> %in2, i32* %out)<br>
                                    >> {<br>
                                    >> entry:<br>
                                    >>   %tmp.0 = alloca i32<br>
                                    >>   %tmp.1 = alloca i32<br>
                                    >>   %tmp.i = insertelement
                                    <2 x i32*> undef, i32* %tmp.0,
                                    i32 0<br>
                                    >>   %tmp = insertelement
                                    <2 x i32*> %tmp.i, i32*
                                    %tmp.1, i32 1<br>
                                    >>   %in1.v = call <2 x
                                    i32>
                                    @llvm.masked.gather.v2i32(<2 x
                                    i32*> %in1, i32<br>
                                    >> 1, <2 x i1> <i1
                                    true, i1 true>, <2 x i32>
                                    undef) #1<br>
                                    >>   %in2.v = call <2 x
                                    i32>
                                    @llvm.masked.gather.v2i32(<2 x
                                    i32*> %in2, i32<br>
                                    >> 1, <2 x i1> <i1
                                    true, i1 true>, <2 x i32>
                                    undef) #1<br>
                                    >>   call void
                                    @llvm.masked.scatter.v2i32(<2 x
                                    i32> %in1.v, <2 x i32*>
                                    %tmp,<br>
                                    >> i32 1, <2 x i1>
                                    <i1 true, i1 true>)<br>
                                    >>   %tmp.v.0 = call <2 x
                                    i32>
                                    @llvm.masked.gather.v2i32(<2 x
                                    i32*> %tmp, i32<br>
                                    >> 1, <2 x i1> <i1
                                    true, i1 true>, <2 x i32>
                                    undef) #1<br>
                                    >>   call void
                                    @llvm.masked.scatter.v2i32(<2 x
                                    i32> %in2.v, <2 x i32*>
                                    %tmp,<br>
                                    >> i32 1, <2 x i1>
                                    <i1 true, i1 true>)<br>
                                    >>   ; The call to
                                    masked.gather is gone and now we are
                                    using the old value<br>
                                    >> (%tmp.v.0)<br>
                                    >>   %tmp.v.1.0 =
                                    extractelement <2 x i32>
                                    %tmp.v.0, i32 0<br>
                                    >>   %tmp.v.1.1 =
                                    extractelement <2 x i32>
                                    %tmp.v.0, i32 1<br>
                                    >>   store i32 %tmp.v.1.0,
                                    i32* %out<br>
                                    >>   %out.1 = getelementptr
                                    i32, i32* %out, i32 1<br>
                                    >>   store i32 %tmp.v.1.1,
                                    i32* %out.1<br>
                                    >>   ret void<br>
                                    >> }<br>
                                    ><br>
                                    ><br>
                                    > The old value read from %tmp is
                                    used, instead of the new one. I
                                    tested this<br>
                                    > code using `opt -gvn`, with
                                    LLVM 3.8.1. I also tried tip
                                    (84cb7f4) with the<br>
                                    > same result.<br>
                                    ><br>
                                    > I should mention that if I
                                    replace the second scatter with
                                    stores, the issue<br>
                                    > persists. The only way to avoid
                                    it is to replace all scatter/gather<br>
                                    > intrinsics with equivalent sets
                                    of store/load, in which case the
                                    MemDep<br>
                                    > returns the correct
                                    dependencies and the GVN pass will
                                    not remove the second<br>
                                    > set of loads.<br>
                                    ><br>
                                    > So, my question is, is this a
                                    bug or am I doing something that I
                                    shouldn't<br>
                                    > be in the IR? And if it is a
                                    bug, is it the AA analyses that
                                    return the<br>
                                    > wrong result (I presume so) or
                                    should GVN handle such cases
                                    differently?<br>
                                    ><br>
                                    > Best regards,<br>
                                    ><br>
                                    > Chris<br>
                                    ><br>
                                  </div>
                                </div>
                                > ______________________________<wbr>_________________<br>
                                > LLVM Developers mailing list<br>
                                > <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
                                > <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
                                ><br>
                                <span><font color="#888888"><br>
                                    <br>
                                    <br>
                                    --<br>
                                    Davide<br>
                                    <br>
                                    "There are no solved problems; there
                                    are only problems that are more<br>
                                    or less solved" -- Henri Poincare<br>
                                  </font></span></blockquote>
                            </div>
                            <br>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </div></blockquote></div></div></div>