<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p>Sounds good.  I'll go ahead and check in what I have now (since
      it is NFC) and we can proceed from there.</p>
    <p>I'm currently trying out an approach where upon removing a store
      I eagerly re-optimize phis, but just mark uses as dirty, then only
      re-compute the clobber for dirty uses if they come up in a
      isSameMemGeneration query or at the end of the pass, which is sort
      of a hybrid of the two approaches you described.<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 8/24/2016 7:34 PM, Daniel Berlin
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAF4BwTXACdw-Lnhn6hSab9U5LXeNd1CTgMAtWLci3j4nX7JKkg@mail.gmail.com"
      type="cite">
      <div dir="ltr">(modulo this, btw, the code LGTM.I definitely don't
        think we should wait to resolve this before you commit it).
        <div><br>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Tue, Aug 23, 2016 at 1:32 PM, Daniel
          Berlin <span dir="ltr"><<a moz-do-not-send="true"
              href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">Ugh.
              <div>Okay, so.</div>
              <div>A couple thoughts:<br>
                <br>
              </div>
              <div>1. We could maintain it by doing batch use opt at the
                end of each pass or something, and then using the faster
                use optimizer to reoptimize affected loads (it's easy to
                tell the set of possibly affected loads).</div>
              <div><br>
              </div>
              <div>2. We could drop it.  We would still build an
                initially optimized form, but whatever happens past
                that, happens.  We would make getClobberingMemoryAccess
                in the default walker modify loads to store the
                optimized result on them, and add a bool to MemoryUse to
                track which loads may need to be reoptimized (that gets
                reset in setDefiningAccess unless you tell it not to).</div>
              <div><br>
                This will let you call getClobberingMemoryAccess on
                everything without it being really expensive for loads
                (right now it does a bunch of search/etc setup, and we
                need a way to avoid this)<br>
              </div>
              <div><br>
              </div>
              <div>The downside to all of this is<br>
                <br>
              </div>
              <div>1. The walker now modifies loads.  Even if you are
                storing the defining accesses somewhere, and we change
                it (not sure why this would happen, but ...), it should
                be okay, it would just mean you have a valid but
                conservative answer.</div>
              <div><br>
              </div>
              <div>2.  It means a stores uses will no longer include all
                possible loads that die there. It will only include
                some, unless you run around and call
                getClobberingMemoryAccess on every load.<br>
              </div>
              <div><br>
              </div>
              <div>I will have to do that in NewGVN to make load/store
                forwarding work sanely, but i'll deal with that.</div>
              <div><br>
              </div>
              <div><br>
              </div>
              <div>+ a few others for thoughts.</div>
              <div><br>
              </div>
            </div>
            <div class="HOEnZb">
              <div class="h5">
                <div class="gmail_extra"><br>
                  <div class="gmail_quote">On Tue, Aug 23, 2016 at 1:04
                    PM, Geoff Berry <span dir="ltr"><<a
                        moz-do-not-send="true"
                        href="mailto:gberry@codeaurora.org"
                        target="_blank">gberry@codeaurora.org</a>></span>
                    wrote:<br>
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      <div bgcolor="#FFFFFF" text="#000000">
                        <div>
                          <div>
                            <p><br>
                            </p>
                            <br>
                            <div>On 8/23/2016 2:17 PM, Geoff Berry via
                              llvm-commits wrote:<br>
                            </div>
                            <blockquote type="cite">
                              <p><br>
                              </p>
                              <br>
                              <div>On 8/23/2016 1:47 PM, Daniel Berlin
                                wrote:<br>
                              </div>
                              <blockquote type="cite">
                                <div dir="ltr"><br>
                                  <div class="gmail_extra"><br>
                                    <div class="gmail_quote">On Tue, Aug
                                      23, 2016 at 10:17 AM, Geoff Berry
                                      <span dir="ltr"><<a
                                          moz-do-not-send="true"
                                          href="mailto:gberry@codeaurora.org"
                                          target="_blank">gberry@codeaurora.org</a>></span>
                                      wrote:<br>
                                      <blockquote class="gmail_quote"
                                        style="margin:0 0 0
                                        .8ex;border-left:1px #ccc
                                        solid;padding-left:1ex">gberry
                                        added inline comments.<br>
                                        <span><br>
                                          ================<br>
                                          Comment at:
                                          lib/Transforms/Scalar/EarlyCSE<wbr>.cpp:537<br>
                                          @@ +536,3 @@<br>
                                          +  MemoryAccess *LaterDef =<br>
                                          +     
                                          MSSA->getWalker()->getClobberi<wbr>ngMemoryAccess(LaterInst);<br>
                                          +  return
                                          MSSA->dominates(LaterDef,
                                          MSSA->getMemoryAccess(EarlierI<wbr>nst));<br>
                                          ----------------<br>
                                        </span><span>dberlin wrote:<br>
                                          > gberry wrote:<br>
                                          > > dberlin wrote:<br>
                                          > > > 1. For loads,
                                          you don't have to ask for the
                                          clobbering access. It's
                                          already optimized such that
                                          getDefiningAccess == the
                                          clobbering access<br>
                                          > > ><br>
                                          > > > 2. For stores,
                                          not sure if you realize this,
                                          but<br>
                                          > > ><br>
                                          > > > given<br>
                                          > > > store q 
                                          (lets's call this a)<br>
                                          > > > x = load p<br>
                                          > > > store q (let's
                                          call this b)<br>
                                          > > ><br>
                                          > > ><br>
                                          > > > if you call
                                          getClobberingMemoryAccess on
                                          b, it will return a.<br>
                                          > > ><br>
                                          > > ><br>
                                          > > For 1., I was not
                                          clear on whether this holds
                                          true after store removal.<br>
                                          > ><br>
                                          > > For 2., yeah I get
                                          this, I'm not sure what you're
                                          getting at though.  The
                                          removal of this second store
                                          by EarlyCSE doesn't use
                                          MemorySSA to check for
                                          intervening loads in this
                                          change.  It uses the
                                          'LastStore' tracking to know
                                          when a store made redundant by
                                          a second store can be removed.<br>
                                          > 1. Updates have to make
                                          it hold after store removal :)<br>
                                          ><br>
                                          > The problem is that if we
                                          don't keep this invariant up
                                          to date, it means everyone
                                          uses getClobberingAccess,
                                          which does a bunch of work to
                                          discover the load already
                                          points to the same thing.<br>
                                          ><br>
                                          > Everyone doing that is
                                          much higher than the cost of
                                          one person updating the
                                          dominating def.<br>
                                          ><br>
                                          > (there is one case were
                                          getClobberingAccess will give
                                          you a better answer, and that
                                          is on cases where we gave up
                                          during use optimization. I
                                          only have one testcase this
                                          occurs on.  We only give up on
                                          optimizing a load if it's
                                          going to be super expensive,
                                          and you probably do *not* want
                                          to try to get better answers
                                          in that case).<br>
                                          ><br>
                                          > As for updating when you
                                          remove stores, you should
                                          simply be able to replace any
                                          loads the store uses with
                                          getClobberingAccess(store)
                                          using RAUW.<br>
                                          ><br>
                                          > Under the covers,
                                          removeMemoryAccess calls RAUW
                                          with the DefiningAccess.<br>
                                          > We could change it to use
                                          getClobberingMemoryAccess for
                                          loads, and DefiningAccess for
                                          stores.<br>
                                          ><br>
                                          > 2. ah, okay.<br>
                                          ><br>
                                          ><br>
                                        </span>Okay, I get why just
                                        checking the defining access for
                                        loads is better (we get to skip
                                        the AA check).<br>
                                      </blockquote>
                                      <div> </div>
                                      <blockquote class="gmail_quote"
                                        style="margin:0 0 0
                                        .8ex;border-left:1px #ccc
                                        solid;padding-left:1ex"> For
                                        stores, we may be able to do
                                        something faster than calling
                                        getClobberingAccess(store).  We
                                        could instead walk up the store
                                        defining access chain and stop
                                        if we get to a point that
                                        dominates the earlier load or a
                                        clobbering write.  I'll have to
                                        time this to see if it makes a
                                        difference.  I guess it will
                                        depend on what percentage of the
                                        time the clobber cache has been
                                        thrown away.<br>
                                      </blockquote>
                                      <div><br>
                                      </div>
                                      <div>Yes.  You are basically
                                        designing a custom walker here,
                                        and that's great.</div>
                                      <div>If it matters, I would just
                                        make it a walker class and  </div>
                                      <blockquote class="gmail_quote"
                                        style="margin:0 0 0
                                        .8ex;border-left:1px #ccc
                                        solid;padding-left:1ex"> <br>
                                        As for updating when removing
                                        stores: it seems like doing RAUW
                                        getClobberingAccess(store) is
                                        not optimal in some cases.  </blockquote>
                                      <div><br>
                                        Just out of curiosity do you
                                        have a real example of the
                                        below?  We work really hard to
                                        avoid generating that.</div>
                                      <div><br>
                                      </div>
                                      <div> </div>
                                      <blockquote class="gmail_quote"
                                        style="margin:0 0 0
                                        .8ex;border-left:1px #ccc
                                        solid;padding-left:1ex">For
                                        example:<br>
                                        <br>
                                          store @G1 ; 1 = MD(entry)<br>
                                          store @G2 ; 2 = MD(1)<br>
                                          store %p ; 3 = MD(2)<br>
                                          load @G1 ; MU(3)<br>
                                          load @G2  ; MU(3)<br>
                                      </blockquote>
                                      <div><br>
                                      </div>
                                      <div>These should not be MU(3),
                                        they should be MU(1) and MU(2),
                                        unless the store is *actually* a
                                        clobber for them. If the store
                                        is actually a clobber, i don't
                                        see how you could legally remove
                                        or move the store (since it
                                        conflicts with G1/G2) :)</div>
                                      <div><br>
                                      </div>
                                      <div> </div>
                                      <blockquote class="gmail_quote"
                                        style="margin:0 0 0
                                        .8ex;border-left:1px #ccc
                                        solid;padding-left:1ex"> <br>
                                        If we remove 3 and RUAW
                                        getClobberingAccess(3) (=2) we
                                        get:<br>
                                        <br>
                                          store @G1 ; 1 = MD(entry)<br>
                                          store @G2 ; 2 = MD(1)<br>
                                          load @G1 ; MU(2)<br>
                                          load @G2  ; MU(2)</blockquote>
                                      <div><br>
                                      </div>
                                      <div>I agree that if you can get
                                        memoryssa to generate the first,
                                        you will get the second and it
                                        will be non-optimal.  If you
                                        can, and it's legal to move or
                                        remove the store, that seems
                                        like a super-rare case unless
                                        i'm missing something?<br>
                                      </div>
                                    </div>
                                  </div>
                                </div>
                              </blockquote>
                              <br>
                              Just to reply to this one particular point
                              for now, here's a case where this can
                              happen because of EarlyCSE removing a
                              store that is simply storing back the
                              value just loaded from the same memory
                              location:<br>
                              <br>
                              <br>
                              define void @test3(i32* %p) {<br>
                              entry:<br>
                              ; 1 = MemoryDef(liveOnEntry)<br>
                                store i32 1, i32* @G1<br>
                              ; 2 = MemoryDef(1)<br>
                                store i32 2, i32* @G2<br>
                              ; MemoryUse(2)<br>
                                %xp = load i32, i32* %p<br>
                              ; 3 = MemoryDef(2)<br>
                                store i32 %xp, i32* %p<br>
                              ; MemoryUse(3)<br>
                                %x1 = load i32, i32* @G1<br>
                              ; MemoryUse(3)<br>
                                %x2 = load i32, i32* @G2<br>
                              ; 4 = MemoryDef(3)<br>
                                call void @foo(i32 %x1, i32 %x2, i32
                              %xp)<br>
                                ret void<br>
                              }<br>
                              <br>
                              Removing:   store i32 %xp, i32* %p<br>
                              <br>
                              define void @test3(i32* %p) {<br>
                              entry:<br>
                              ; 1 = MemoryDef(liveOnEntry)<br>
                                store i32 1, i32* @G1<br>
                              ; 2 = MemoryDef(1)<br>
                                store i32 2, i32* @G2<br>
                              ; MemoryUse(2)<br>
                                %xp = load i32, i32* %p<br>
                              ; MemoryUse(2)<br>
                                %x1 = load i32, i32* @G1<br>
                              ; MemoryUse(2)<br>
                                %x2 = load i32, i32* @G2<br>
                              ; 4 = MemoryDef(2)<br>
                                call void @foo(i32 %x1, i32 %x2, i32
                              %xp)<br>
                                ret void<br>
                              }<br>
                              <br>
                            </blockquote>
                            <br>
                          </div>
                        </div>
                        ... and to push this case a bit further, it
                        seems like it wouldn't be too hard to construct
                        a case where removing this store would
                        necessitate looking down through MemoryPhis in
                        order to maintain the optimized-use invariant,
                        possibly even leading to more MemoryPhi
                        optimization?<br>
                        <br>
                        <blockquote type="cite"><span> <br>
                            <blockquote type="cite">
                              <div dir="ltr">
                                <div class="gmail_extra">
                                  <div class="gmail_quote">
                                    <div>If you have a testcase with a
                                      real example, that would be
                                      super-interesting to me.<br>
                                    </div>
                                    <div>Because if we can't get it to
                                      generate precise info in most if
                                      not all cases, it's  possibly not
                                      worth doing the work we are doing
                                      to optimize uses.</div>
                                    <div><br>
                                    </div>
                                    <div>I don't claim we make the
                                      perfect set of tradeoffs right
                                      now. I know for the use cases i
                                      cared about, it makes an IMHO good
                                      set of choices.</div>
                                    <div><br>
                                    </div>
                                    <div><br>
                                    </div>
                                    <blockquote class="gmail_quote"
                                      style="margin:0 0 0
                                      .8ex;border-left:1px #ccc
                                      solid;padding-left:1ex"> <br>
                                    </blockquote>
                                    <blockquote class="gmail_quote"
                                      style="margin:0 0 0
                                      .8ex;border-left:1px #ccc
                                      solid;padding-left:1ex"> but the
                                      load @G1 would be more precise if
                                      it was MU(1) (and the invariant
                                      that defining access == clobbering
                                      access would be broken).  Is this
                                      just a compile-time/precision
                                      trade-off?  </blockquote>
                                    <div><br>
                                    </div>
                                    <div>Kinda. The main problem is it
                                      affects everyone downstream *right
                                      now* (again, we could change this
                                      at the cost of having the walker
                                      mutate loads).</div>
                                    <div><br>
                                    </div>
                                    <div>Imagine something upstream of
                                      you generates imprecise info for
                                      loads.</div>
                                    <div>You use getClobberingAccess to
                                      work around it, because you want
                                      the best answer.</div>
                                    <div><br>
                                      Assume everything preserves
                                      MemorySSA.</div>
                                    <div>We still can't really cache
                                      this sanely with a side-cache:
                                       either it gets blown away, or
                                      takes up too much space.</div>
                                    <div>Caching the result of every
                                      load without rewriting is another
                                      O(N) space factor (we did it
                                      before, and george had to undo it
                                      :P).</div>
                                    <div><br>
                                    </div>
                                    <div>So now you wasted a bunch of
                                      time looking up the real
                                      clobbering result for the load.</div>
                                    <div>So will the next pass, and the
                                      pass after that, and ...</div>
                                    <div><br>
                                    </div>
                                    <div>Stores get pushed *upwards* a
                                      lot less than loads, so calling
                                      getClobberingMemoryAccess on them
                                      is more rare, and caching them is
                                      more sane, and "misses" are lower
                                      cost.</div>
                                    <div><br>
                                    </div>
                                    <div>For loads, we can work around
                                      having to have a separate cache by
                                      rewriting the defining access in
                                      the main walker as a cache (and
                                      marking them as optimized with a
                                      single bit or something, so we
                                      know not to waste time doing it
                                      again until the defining access
                                      gets reset again), but then the
                                      walker is mutating the MemorySSA,
                                      which is "not ideal" if we can
                                      avoid it.</div>
                                    <div><br>
                                    </div>
                                    <div>If we can't, such is life, i'm
                                      basically trying to kick this can
                                      as far down the road as i can
                                      before we use that solution :)</div>
                                    <div><br>
                                    </div>
                                    <div>But y'all are the ones doing a
                                      lot of the work at this point, so
                                      let me know if you think we should
                                      stop trying to maintain these
                                      invariants.</div>
                                    <div><br>
                                    </div>
                                    <div> </div>
                                    <blockquote class="gmail_quote"
                                      style="margin:0 0 0
                                      .8ex;border-left:1px #ccc
                                      solid;padding-left:1ex">Maybe for
                                      that reason it makes more sense to
                                      let the client decide if they want
                                      to do the simple RAUW
                                      getClobberingAccess(Store) or
                                      optimize each use separately?<br>
                                      <br>
                                    </blockquote>
                                    <div><br>
                                      If we can make it come out optimal
                                      for everyone, we should do that.</div>
                                    <div>Otherwise, one client screws up
                                      another :)<br>
                                    </div>
                                    <div><br>
                                    </div>
                                    <div>If we can't, yeah, sure, then
                                      we start making tradeoffs.</div>
                                    <div><br>
                                    </div>
                                    <div><br>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </blockquote>
                            <br>
                            <pre cols="72">-- 
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
 Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>
                            <br>
                            <fieldset></fieldset>
                            <br>
                          </span>
                          <pre>______________________________<wbr>_________________
llvm-commits mailing list
<a moz-do-not-send="true" href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>
<a moz-do-not-send="true" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a>
</pre>
    </blockquote><span>
    

    <pre cols="72">-- 
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
 Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>
  </span></div>

</blockquote></div>
</div>
</div></div></blockquote></div>
</div>



</blockquote>
<pre class="moz-signature" cols="72">-- 
Geoff Berry
Employee of Qualcomm Datacenter Technologies, Inc.
 Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre></body></html>