<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p>Sounds good.  I'll go ahead and check in what I have now (since

      it is NFC) and we can proceed from there.</p>

    <p>I'm currently trying out an approach where upon removing a store

      I eagerly re-optimize phis, but just mark uses as dirty, then only

      re-compute the clobber for dirty uses if they come up in a

      isSameMemGeneration query or at the end of the pass, which is sort

      of a hybrid of the two approaches you described.<br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 8/24/2016 7:34 PM, Daniel Berlin

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAF4BwTXACdw-Lnhn6hSab9U5LXeNd1CTgMAtWLci3j4nX7JKkg@mail.gmail.com"

      type="cite">

      <div dir="ltr">(modulo this, btw, the code LGTM.I definitely don't

        think we should wait to resolve this before you commit it).

        <div><br>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Tue, Aug 23, 2016 at 1:32 PM, Daniel

          Berlin <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div dir="ltr">Ugh.

              <div>Okay, so.</div>

              <div>A couple thoughts:<br>

                <br>

              </div>

              <div>1. We could maintain it by doing batch use opt at the

                end of each pass or something, and then using the faster

                use optimizer to reoptimize affected loads (it's easy to

                tell the set of possibly affected loads).</div>

              <div><br>

              </div>

              <div>2. We could drop it.  We would still build an

                initially optimized form, but whatever happens past

                that, happens.  We would make getClobberingMemoryAccess

                in the default walker modify loads to store the

                optimized result on them, and add a bool to MemoryUse to

                track which loads may need to be reoptimized (that gets

                reset in setDefiningAccess unless you tell it not to).</div>

              <div><br>

                This will let you call getClobberingMemoryAccess on

                everything without it being really expensive for loads

                (right now it does a bunch of search/etc setup, and we

                need a way to avoid this)<br>

              </div>

              <div><br>

              </div>

              <div>The downside to all of this is<br>

                <br>

              </div>

              <div>1. The walker now modifies loads.  Even if you are

                storing the defining accesses somewhere, and we change

                it (not sure why this would happen, but ...), it should

                be okay, it would just mean you have a valid but

                conservative answer.</div>

              <div><br>

              </div>

              <div>2.  It means a stores uses will no longer include all

                possible loads that die there. It will only include

                some, unless you run around and call

                getClobberingMemoryAccess on every load.<br>

              </div>

              <div><br>

              </div>

              <div>I will have to do that in NewGVN to make load/store

                forwarding work sanely, but i'll deal with that.</div>

              <div><br>

              </div>

              <div><br>

              </div>

              <div>+ a few others for thoughts.</div>

              <div><br>

              </div>

            </div>

            <div class="HOEnZb">

              <div class="h5">

                <div class="gmail_extra"><br>

                  <div class="gmail_quote">On Tue, Aug 23, 2016 at 1:04

                    PM, Geoff Berry <span dir="ltr"><<a

                        moz-do-not-send="true"

                        href="mailto:gberry@codeaurora.org"

                        target="_blank">gberry@codeaurora.org</a>></span>

                    wrote:<br>

                    <blockquote class="gmail_quote" style="margin:0 0 0

                      .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      <div bgcolor="#FFFFFF" text="#000000">

                        <div>

                          <div>

                            <p><br>

                            </p>

                            <br>

                            <div>On 8/23/2016 2:17 PM, Geoff Berry via

                              llvm-commits wrote:<br>

                            </div>

                            <blockquote type="cite">

                              <p><br>

                              </p>

                              <br>

                              <div>On 8/23/2016 1:47 PM, Daniel Berlin

                                wrote:<br>

                              </div>

                              <blockquote type="cite">

                                <div dir="ltr"><br>

                                  <div class="gmail_extra"><br>

                                    <div class="gmail_quote">On Tue, Aug

                                      23, 2016 at 10:17 AM, Geoff Berry

                                      <span dir="ltr"><<a

                                          moz-do-not-send="true"

                                          href="mailto:gberry@codeaurora.org"

                                          target="_blank">gberry@codeaurora.org</a>></span>

                                      wrote:<br>

                                      <blockquote class="gmail_quote"

                                        style="margin:0 0 0

                                        .8ex;border-left:1px #ccc

                                        solid;padding-left:1ex">gberry

                                        added inline comments.<br>

                                        <span><br>

                                          ================<br>

                                          Comment at:

                                          lib/Transforms/Scalar/EarlyCSE<wbr>.cpp:537<br>

                                          @@ +536,3 @@<br>

                                          +  MemoryAccess *LaterDef =<br>

                                          +     

                                          MSSA->getWalker()->getClobberi<wbr>ngMemoryAccess(LaterInst);<br>

                                          +  return

                                          MSSA->dominates(LaterDef,

                                          MSSA->getMemoryAccess(EarlierI<wbr>nst));<br>

                                          ----------------<br>

                                        </span><span>dberlin wrote:<br>

                                          > gberry wrote:<br>

                                          > > dberlin wrote:<br>

                                          > > > 1. For loads,

                                          you don't have to ask for the

                                          clobbering access. It's

                                          already optimized such that

                                          getDefiningAccess == the

                                          clobbering access<br>

                                          > > ><br>

                                          > > > 2. For stores,

                                          not sure if you realize this,

                                          but<br>

                                          > > ><br>

                                          > > > given<br>

                                          > > > store q 

                                          (lets's call this a)<br>

                                          > > > x = load p<br>

                                          > > > store q (let's

                                          call this b)<br>

                                          > > ><br>

                                          > > ><br>

                                          > > > if you call

                                          getClobberingMemoryAccess on

                                          b, it will return a.<br>

                                          > > ><br>

                                          > > ><br>

                                          > > For 1., I was not

                                          clear on whether this holds

                                          true after store removal.<br>

                                          > ><br>

                                          > > For 2., yeah I get

                                          this, I'm not sure what you're

                                          getting at though.  The

                                          removal of this second store

                                          by EarlyCSE doesn't use

                                          MemorySSA to check for

                                          intervening loads in this

                                          change.  It uses the

                                          'LastStore' tracking to know

                                          when a store made redundant by

                                          a second store can be removed.<br>

                                          > 1. Updates have to make

                                          it hold after store removal :)<br>

                                          ><br>

                                          > The problem is that if we

                                          don't keep this invariant up

                                          to date, it means everyone

                                          uses getClobberingAccess,

                                          which does a bunch of work to

                                          discover the load already

                                          points to the same thing.<br>

                                          ><br>

                                          > Everyone doing that is

                                          much higher than the cost of

                                          one person updating the

                                          dominating def.<br>

                                          ><br>

                                          > (there is one case were

                                          getClobberingAccess will give

                                          you a better answer, and that

                                          is on cases where we gave up

                                          during use optimization. I

                                          only have one testcase this

                                          occurs on.  We only give up on

                                          optimizing a load if it's

                                          going to be super expensive,

                                          and you probably do *not* want

                                          to try to get better answers

                                          in that case).<br>

                                          ><br>

                                          > As for updating when you

                                          remove stores, you should

                                          simply be able to replace any

                                          loads the store uses with

                                          getClobberingAccess(store)

                                          using RAUW.<br>

                                          ><br>

                                          > Under the covers,

                                          removeMemoryAccess calls RAUW

                                          with the DefiningAccess.<br>

                                          > We could change it to use

                                          getClobberingMemoryAccess for

                                          loads, and DefiningAccess for

                                          stores.<br>

                                          ><br>

                                          > 2. ah, okay.<br>

                                          ><br>

                                          ><br>

                                        </span>Okay, I get why just

                                        checking the defining access for

                                        loads is better (we get to skip

                                        the AA check).<br>

                                      </blockquote>

                                      <div> </div>

                                      <blockquote class="gmail_quote"

                                        style="margin:0 0 0

                                        .8ex;border-left:1px #ccc

                                        solid;padding-left:1ex"> For

                                        stores, we may be able to do

                                        something faster than calling

                                        getClobberingAccess(store).  We

                                        could instead walk up the store

                                        defining access chain and stop

                                        if we get to a point that

                                        dominates the earlier load or a

                                        clobbering write.  I'll have to

                                        time this to see if it makes a

                                        difference.  I guess it will

                                        depend on what percentage of the

                                        time the clobber cache has been

                                        thrown away.<br>

                                      </blockquote>

                                      <div><br>

                                      </div>

                                      <div>Yes.  You are basically

                                        designing a custom walker here,

                                        and that's great.</div>

                                      <div>If it matters, I would just

                                        make it a walker class and  </div>

                                      <blockquote class="gmail_quote"

                                        style="margin:0 0 0

                                        .8ex;border-left:1px #ccc

                                        solid;padding-left:1ex"> <br>

                                        As for updating when removing

                                        stores: it seems like doing RAUW

                                        getClobberingAccess(store) is

                                        not optimal in some cases.  </blockquote>

                                      <div><br>

                                        Just out of curiosity do you

                                        have a real example of the

                                        below?  We work really hard to

                                        avoid generating that.</div>

                                      <div><br>

                                      </div>

                                      <div> </div>

                                      <blockquote class="gmail_quote"

                                        style="margin:0 0 0

                                        .8ex;border-left:1px #ccc

                                        solid;padding-left:1ex">For

                                        example:<br>

                                        <br>

                                          store @G1 ; 1 = MD(entry)<br>

                                          store @G2 ; 2 = MD(1)<br>

                                          store %p ; 3 = MD(2)<br>

                                          load @G1 ; MU(3)<br>

                                          load @G2  ; MU(3)<br>

                                      </blockquote>

                                      <div><br>

                                      </div>

                                      <div>These should not be MU(3),

                                        they should be MU(1) and MU(2),

                                        unless the store is *actually* a

                                        clobber for them. If the store

                                        is actually a clobber, i don't

                                        see how you could legally remove

                                        or move the store (since it

                                        conflicts with G1/G2) :)</div>

                                      <div><br>

                                      </div>

                                      <div> </div>

                                      <blockquote class="gmail_quote"

                                        style="margin:0 0 0

                                        .8ex;border-left:1px #ccc

                                        solid;padding-left:1ex"> <br>

                                        If we remove 3 and RUAW

                                        getClobberingAccess(3) (=2) we

                                        get:<br>

                                        <br>

                                          store @G1 ; 1 = MD(entry)<br>

                                          store @G2 ; 2 = MD(1)<br>

                                          load @G1 ; MU(2)<br>

                                          load @G2  ; MU(2)</blockquote>

                                      <div><br>

                                      </div>

                                      <div>I agree that if you can get

                                        memoryssa to generate the first,

                                        you will get the second and it

                                        will be non-optimal.  If you

                                        can, and it's legal to move or

                                        remove the store, that seems

                                        like a super-rare case unless

                                        i'm missing something?<br>

                                      </div>

                                    </div>

                                  </div>

                                </div>

                              </blockquote>

                              <br>

                              Just to reply to this one particular point

                              for now, here's a case where this can

                              happen because of EarlyCSE removing a

                              store that is simply storing back the

                              value just loaded from the same memory

                              location:<br>

                              <br>

                              <br>

                              define void @test3(i32* %p) {<br>

                              entry:<br>

                              ; 1 = MemoryDef(liveOnEntry)<br>

                                store i32 1, i32* @G1<br>

                              ; 2 = MemoryDef(1)<br>

                                store i32 2, i32* @G2<br>

                              ; MemoryUse(2)<br>

                                %xp = load i32, i32* %p<br>

                              ; 3 = MemoryDef(2)<br>

                                store i32 %xp, i32* %p<br>

                              ; MemoryUse(3)<br>

                                %x1 = load i32, i32* @G1<br>

                              ; MemoryUse(3)<br>

                                %x2 = load i32, i32* @G2<br>

                              ; 4 = MemoryDef(3)<br>

                                call void @foo(i32 %x1, i32 %x2, i32

                              %xp)<br>

                                ret void<br>

                              }<br>

                              <br>

                              Removing:   store i32 %xp, i32* %p<br>

                              <br>

                              define void @test3(i32* %p) {<br>

                              entry:<br>

                              ; 1 = MemoryDef(liveOnEntry)<br>

                                store i32 1, i32* @G1<br>

                              ; 2 = MemoryDef(1)<br>

                                store i32 2, i32* @G2<br>

                              ; MemoryUse(2)<br>

                                %xp = load i32, i32* %p<br>

                              ; MemoryUse(2)<br>

                                %x1 = load i32, i32* @G1<br>

                              ; MemoryUse(2)<br>

                                %x2 = load i32, i32* @G2<br>

                              ; 4 = MemoryDef(2)<br>

                                call void @foo(i32 %x1, i32 %x2, i32

                              %xp)<br>

                                ret void<br>

                              }<br>

                              <br>

                            </blockquote>

                            <br>

                          </div>

                        </div>

                        ... and to push this case a bit further, it

                        seems like it wouldn't be too hard to construct

                        a case where removing this store would

                        necessitate looking down through MemoryPhis in

                        order to maintain the optimized-use invariant,

                        possibly even leading to more MemoryPhi

                        optimization?<br>

                        <br>

                        <blockquote type="cite"><span> <br>

                            <blockquote type="cite">

                              <div dir="ltr">

                                <div class="gmail_extra">

                                  <div class="gmail_quote">

                                    <div>If you have a testcase with a

                                      real example, that would be

                                      super-interesting to me.<br>

                                    </div>

                                    <div>Because if we can't get it to

                                      generate precise info in most if

                                      not all cases, it's  possibly not

                                      worth doing the work we are doing

                                      to optimize uses.</div>

                                    <div><br>

                                    </div>

                                    <div>I don't claim we make the

                                      perfect set of tradeoffs right

                                      now. I know for the use cases i

                                      cared about, it makes an IMHO good

                                      set of choices.</div>

                                    <div><br>

                                    </div>

                                    <div><br>

                                    </div>

                                    <blockquote class="gmail_quote"

                                      style="margin:0 0 0

                                      .8ex;border-left:1px #ccc

                                      solid;padding-left:1ex"> <br>

                                    </blockquote>

                                    <blockquote class="gmail_quote"

                                      style="margin:0 0 0

                                      .8ex;border-left:1px #ccc

                                      solid;padding-left:1ex"> but the

                                      load @G1 would be more precise if

                                      it was MU(1) (and the invariant

                                      that defining access == clobbering

                                      access would be broken).  Is this

                                      just a compile-time/precision

                                      trade-off?  </blockquote>

                                    <div><br>

                                    </div>

                                    <div>Kinda. The main problem is it

                                      affects everyone downstream *right

                                      now* (again, we could change this

                                      at the cost of having the walker

                                      mutate loads).</div>

                                    <div><br>

                                    </div>

                                    <div>Imagine something upstream of

                                      you generates imprecise info for

                                      loads.</div>

                                    <div>You use getClobberingAccess to

                                      work around it, because you want

                                      the best answer.</div>

                                    <div><br>

                                      Assume everything preserves

                                      MemorySSA.</div>

                                    <div>We still can't really cache

                                      this sanely with a side-cache:

                                       either it gets blown away, or

                                      takes up too much space.</div>

                                    <div>Caching the result of every

                                      load without rewriting is another

                                      O(N) space factor (we did it

                                      before, and george had to undo it

                                      :P).</div>

                                    <div><br>

                                    </div>

                                    <div>So now you wasted a bunch of

                                      time looking up the real

                                      clobbering result for the load.</div>

                                    <div>So will the next pass, and the

                                      pass after that, and ...</div>

                                    <div><br>

                                    </div>

                                    <div>Stores get pushed *upwards* a

                                      lot less than loads, so calling

                                      getClobberingMemoryAccess on them

                                      is more rare, and caching them is

                                      more sane, and "misses" are lower

                                      cost.</div>

                                    <div><br>

                                    </div>

                                    <div>For loads, we can work around

                                      having to have a separate cache by

                                      rewriting the defining access in

                                      the main walker as a cache (and

                                      marking them as optimized with a

                                      single bit or something, so we

                                      know not to waste time doing it

                                      again until the defining access

                                      gets reset again), but then the

                                      walker is mutating the MemorySSA,

                                      which is "not ideal" if we can

                                      avoid it.</div>

                                    <div><br>

                                    </div>

                                    <div>If we can't, such is life, i'm

                                      basically trying to kick this can

                                      as far down the road as i can

                                      before we use that solution :)</div>

                                    <div><br>

                                    </div>

                                    <div>But y'all are the ones doing a

                                      lot of the work at this point, so

                                      let me know if you think we should

                                      stop trying to maintain these

                                      invariants.</div>

                                    <div><br>

                                    </div>

                                    <div> </div>

                                    <blockquote class="gmail_quote"

                                      style="margin:0 0 0

                                      .8ex;border-left:1px #ccc

                                      solid;padding-left:1ex">Maybe for

                                      that reason it makes more sense to

                                      let the client decide if they want

                                      to do the simple RAUW

                                      getClobberingAccess(Store) or

                                      optimize each use separately?<br>

                                      <br>

                                    </blockquote>

                                    <div><br>

                                      If we can make it come out optimal

                                      for everyone, we should do that.</div>

                                    <div>Otherwise, one client screws up

                                      another :)<br>

                                    </div>

                                    <div><br>

                                    </div>

                                    <div>If we can't, yeah, sure, then

                                      we start making tradeoffs.</div>

                                    <div><br>

                                    </div>

                                    <div><br>

                                    </div>

                                  </div>

                                </div>

                              </div>

                            </blockquote>

                            <br>

                            <pre cols="72">-- 

Geoff Berry

Employee of Qualcomm Datacenter Technologies, Inc.

 Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>

                            <br>

                            <fieldset></fieldset>

                            <br>

                          </span>

                          <pre>______________________________<wbr>_________________

llvm-commits mailing list

<a moz-do-not-send="true" href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>

<a moz-do-not-send="true" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a>

</pre>

    </blockquote><span>

    <pre cols="72">-- 

Geoff Berry

Employee of Qualcomm Datacenter Technologies, Inc.

 Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>

  </span></div>

</blockquote></div>

</div>

</div></div></blockquote></div>

</div>

</blockquote>

<pre class="moz-signature" cols="72">-- 

Geoff Berry

Employee of Qualcomm Datacenter Technologies, Inc.

 Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre></body></html>