<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p><br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 8/23/2016 1:47 PM, Daniel Berlin

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAF4BwTXHQuj4_h1DwFsWqB8e+8pbC9ccbUCx1sZdywTg992DcA@mail.gmail.com"

      type="cite">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Tue, Aug 23, 2016 at 10:17 AM,

            Geoff Berry <span dir="ltr"><<a moz-do-not-send="true"

                href="mailto:gberry@codeaurora.org" target="_blank">gberry@codeaurora.org</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">gberry

              added inline comments.<br>

              <span class=""><br>

                ================<br>

                Comment at: lib/Transforms/Scalar/<wbr>EarlyCSE.cpp:537<br>

                @@ +536,3 @@<br>

                +  MemoryAccess *LaterDef =<br>

                +      MSSA->getWalker()-><wbr>getClobberingMemoryAccess(<wbr>LaterInst);<br>

                +  return MSSA->dominates(LaterDef,

                MSSA->getMemoryAccess(<wbr>EarlierInst));<br>

                ----------------<br>

              </span><span class="">dberlin wrote:<br>

                > gberry wrote:<br>

                > > dberlin wrote:<br>

                > > > 1. For loads, you don't have to ask for

                the clobbering access. It's already optimized such that

                getDefiningAccess == the clobbering access<br>

                > > ><br>

                > > > 2. For stores, not sure if you realize

                this, but<br>

                > > ><br>

                > > > given<br>

                > > > store q  (lets's call this a)<br>

                > > > x = load p<br>

                > > > store q (let's call this b)<br>

                > > ><br>

                > > ><br>

                > > > if you call getClobberingMemoryAccess on

                b, it will return a.<br>

                > > ><br>

                > > ><br>

                > > For 1., I was not clear on whether this holds

                true after store removal.<br>

                > ><br>

                > > For 2., yeah I get this, I'm not sure what

                you're getting at though.  The removal of this second

                store by EarlyCSE doesn't use MemorySSA to check for

                intervening loads in this change.  It uses the

                'LastStore' tracking to know when a store made redundant

                by a second store can be removed.<br>

                > 1. Updates have to make it hold after store removal

                :)<br>

                ><br>

                > The problem is that if we don't keep this invariant

                up to date, it means everyone uses getClobberingAccess,

                which does a bunch of work to discover the load already

                points to the same thing.<br>

                ><br>

                > Everyone doing that is much higher than the cost of

                one person updating the dominating def.<br>

                ><br>

                > (there is one case were getClobberingAccess will

                give you a better answer, and that is on cases where we

                gave up during use optimization. I only have one

                testcase this occurs on.  We only give up on optimizing

                a load if it's going to be super expensive, and you

                probably do *not* want to try to get better answers in

                that case).<br>

                ><br>

                > As for updating when you remove stores, you should

                simply be able to replace any loads the store uses with

                getClobberingAccess(store) using RAUW.<br>

                ><br>

                > Under the covers, removeMemoryAccess calls RAUW

                with the DefiningAccess.<br>

                > We could change it to use getClobberingMemoryAccess

                for loads, and DefiningAccess for stores.<br>

                ><br>

                > 2. ah, okay.<br>

                ><br>

                ><br>

              </span>Okay, I get why just checking the defining access

              for loads is better (we get to skip the AA check).<br>

            </blockquote>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              For stores, we may be able to do something faster than

              calling getClobberingAccess(store).  We could instead walk

              up the store defining access chain and stop if we get to a

              point that dominates the earlier load or a clobbering

              write.  I'll have to time this to see if it makes a

              difference.  I guess it will depend on what percentage of

              the time the clobber cache has been thrown away.<br>

            </blockquote>

            <div><br>

            </div>

            <div>Yes.  You are basically designing a custom walker here,

              and that's great.</div>

            <div>If it matters, I would just make it a walker class and

               </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <br>

              As for updating when removing stores: it seems like doing

              RAUW getClobberingAccess(store) is not optimal in some

              cases.  </blockquote>

            <div><br>

              Just out of curiosity do you have a real example of the

              below?  We work really hard to avoid generating that.</div>

            <div><br>

            </div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">For

              example:<br>

              <br>

                store @G1 ; 1 = MD(entry)<br>

                store @G2 ; 2 = MD(1)<br>

                store %p ; 3 = MD(2)<br>

                load @G1 ; MU(3)<br>

                load @G2  ; MU(3)<br>

            </blockquote>

            <div><br>

            </div>

            <div>These should not be MU(3), they should be MU(1) and

              MU(2), unless the store is *actually* a clobber for them.

              If the store is actually a clobber, i don't see how you

              could legally remove or move the store (since it conflicts

              with G1/G2) :)</div>

            <div><br>

            </div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              <br>

              If we remove 3 and RUAW getClobberingAccess(3) (=2) we

              get:<br>

              <br>

                store @G1 ; 1 = MD(entry)<br>

                store @G2 ; 2 = MD(1)<br>

                load @G1 ; MU(2)<br>

                load @G2  ; MU(2)</blockquote>

            <div><br>

            </div>

            <div>I agree that if you can get memoryssa to generate the

              first, you will get the second and it will be

              non-optimal.  If you can, and it's legal to move or remove

              the store, that seems like a super-rare case unless i'm

              missing something?<br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    Just to reply to this one particular point for now, here's a case

    where this can happen because of EarlyCSE removing a store that is

    simply storing back the value just loaded from the same memory

    location:<br>

    <br>

    <br>

    define void @test3(i32* %p) {<br>

    entry:<br>

    ; 1 = MemoryDef(liveOnEntry)<br>

      store i32 1, i32* @G1<br>

    ; 2 = MemoryDef(1)<br>

      store i32 2, i32* @G2<br>

    ; MemoryUse(2)<br>

      %xp = load i32, i32* %p<br>

    ; 3 = MemoryDef(2)<br>

      store i32 %xp, i32* %p<br>

    ; MemoryUse(3)<br>

      %x1 = load i32, i32* @G1<br>

    ; MemoryUse(3)<br>

      %x2 = load i32, i32* @G2<br>

    ; 4 = MemoryDef(3)<br>

      call void @foo(i32 %x1, i32 %x2, i32 %xp)<br>

      ret void<br>

    }<br>

    <br>

    Removing:   store i32 %xp, i32* %p<br>

    <br>

    define void @test3(i32* %p) {<br>

    entry:<br>

    ; 1 = MemoryDef(liveOnEntry)<br>

      store i32 1, i32* @G1<br>

    ; 2 = MemoryDef(1)<br>

      store i32 2, i32* @G2<br>

    ; MemoryUse(2)<br>

      %xp = load i32, i32* %p<br>

    ; MemoryUse(2)<br>

      %x1 = load i32, i32* @G1<br>

    ; MemoryUse(2)<br>

      %x2 = load i32, i32* @G2<br>

    ; 4 = MemoryDef(2)<br>

      call void @foo(i32 %x1, i32 %x2, i32 %xp)<br>

      ret void<br>

    }<br>

    <br>

    <br>

    <br>

    <blockquote

cite="mid:CAF4BwTXHQuj4_h1DwFsWqB8e+8pbC9ccbUCx1sZdywTg992DcA@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div>If you have a testcase with a real example, that would

              be super-interesting to me.<br>

            </div>

            <div>Because if we can't get it to generate precise info in

              most if not all cases, it's  possibly not worth doing the

              work we are doing to optimize uses.</div>

            <div><br>

            </div>

            <div>I don't claim we make the perfect set of tradeoffs

              right now. I know for the use cases i cared about, it

              makes an IMHO good set of choices.</div>

            <div><br>

            </div>

            <div><br>

            </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>

            </blockquote>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              but the load @G1 would be more precise if it was MU(1)

              (and the invariant that defining access == clobbering

              access would be broken).  Is this just a

              compile-time/precision trade-off?  </blockquote>

            <div><br>

            </div>

            <div>Kinda. The main problem is it affects everyone

              downstream *right now* (again, we could change this at the

              cost of having the walker mutate loads).</div>

            <div><br>

            </div>

            <div>Imagine something upstream of you generates imprecise

              info for loads.</div>

            <div>You use getClobberingAccess to work around it, because

              you want the best answer.</div>

            <div><br>

              Assume everything preserves MemorySSA.</div>

            <div>We still can't really cache this sanely with a

              side-cache:  either it gets blown away, or takes up too

              much space.</div>

            <div>Caching the result of every load without rewriting is

              another O(N) space factor (we did it before, and george

              had to undo it :P).</div>

            <div><br>

            </div>

            <div>So now you wasted a bunch of time looking up the real

              clobbering result for the load.</div>

            <div>So will the next pass, and the pass after that, and ...</div>

            <div><br>

            </div>

            <div>Stores get pushed *upwards* a lot less than loads, so

              calling getClobberingMemoryAccess on them is more rare,

              and caching them is more sane, and "misses" are lower

              cost.</div>

            <div><br>

            </div>

            <div>For loads, we can work around having to have a separate

              cache by rewriting the defining access in the main walker

              as a cache (and marking them as optimized with a single

              bit or something, so we know not to waste time doing it

              again until the defining access gets reset again), but

              then the walker is mutating the MemorySSA, which is "not

              ideal" if we can avoid it.</div>

            <div><br>

            </div>

            <div>If we can't, such is life, i'm basically trying to kick

              this can as far down the road as i can before we use that

              solution :)</div>

            <div><br>

            </div>

            <div>But y'all are the ones doing a lot of the work at this

              point, so let me know if you think we should stop trying

              to maintain these invariants.</div>

            <div><br>

            </div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">Maybe

              for that reason it makes more sense to let the client

              decide if they want to do the simple RAUW

              getClobberingAccess(Store) or optimize each use

              separately?<br>

              <br>

            </blockquote>

            <div><br>

              If we can make it come out optimal for everyone, we should

              do that.</div>

            <div>Otherwise, one client screws up another :)<br>

            </div>

            <div><br>

            </div>

            <div>If we can't, yeah, sure, then we start making

              tradeoffs.</div>

            <div><br>

            </div>

            <div><br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Geoff Berry

Employee of Qualcomm Datacenter Technologies, Inc.

 Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>

  </body>

</html>