<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p><br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 8/23/2016 2:17 PM, Geoff Berry via

      llvm-commits wrote:<br>

    </div>

    <blockquote

      cite="mid:0ced3f92-5798-78a2-6101-a60db9f8af20@codeaurora.org"

      type="cite">

      <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

      <p><br>

      </p>

      <br>

      <div class="moz-cite-prefix">On 8/23/2016 1:47 PM, Daniel Berlin

        wrote:<br>

      </div>

      <blockquote

cite="mid:CAF4BwTXHQuj4_h1DwFsWqB8e+8pbC9ccbUCx1sZdywTg992DcA@mail.gmail.com"

        type="cite">

        <div dir="ltr"><br>

          <div class="gmail_extra"><br>

            <div class="gmail_quote">On Tue, Aug 23, 2016 at 10:17 AM,

              Geoff Berry <span dir="ltr"><<a moz-do-not-send="true"

                  href="mailto:gberry@codeaurora.org" target="_blank">gberry@codeaurora.org</a>></span>

              wrote:<br>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">gberry

                added inline comments.<br>

                <span class=""><br>

                  ================<br>

                  Comment at: lib/Transforms/Scalar/<wbr>EarlyCSE.cpp:537<br>

                  @@ +536,3 @@<br>

                  +  MemoryAccess *LaterDef =<br>

                  +      MSSA->getWalker()-><wbr>getClobberingMemoryAccess(<wbr>LaterInst);<br>

                  +  return MSSA->dominates(LaterDef,

                  MSSA->getMemoryAccess(<wbr>EarlierInst));<br>

                  ----------------<br>

                </span><span class="">dberlin wrote:<br>

                  > gberry wrote:<br>

                  > > dberlin wrote:<br>

                  > > > 1. For loads, you don't have to ask for

                  the clobbering access. It's already optimized such

                  that getDefiningAccess == the clobbering access<br>

                  > > ><br>

                  > > > 2. For stores, not sure if you realize

                  this, but<br>

                  > > ><br>

                  > > > given<br>

                  > > > store q  (lets's call this a)<br>

                  > > > x = load p<br>

                  > > > store q (let's call this b)<br>

                  > > ><br>

                  > > ><br>

                  > > > if you call getClobberingMemoryAccess

                  on b, it will return a.<br>

                  > > ><br>

                  > > ><br>

                  > > For 1., I was not clear on whether this

                  holds true after store removal.<br>

                  > ><br>

                  > > For 2., yeah I get this, I'm not sure what

                  you're getting at though.  The removal of this second

                  store by EarlyCSE doesn't use MemorySSA to check for

                  intervening loads in this change.  It uses the

                  'LastStore' tracking to know when a store made

                  redundant by a second store can be removed.<br>

                  > 1. Updates have to make it hold after store

                  removal :)<br>

                  ><br>

                  > The problem is that if we don't keep this

                  invariant up to date, it means everyone uses

                  getClobberingAccess, which does a bunch of work to

                  discover the load already points to the same thing.<br>

                  ><br>

                  > Everyone doing that is much higher than the cost

                  of one person updating the dominating def.<br>

                  ><br>

                  > (there is one case were getClobberingAccess will

                  give you a better answer, and that is on cases where

                  we gave up during use optimization. I only have one

                  testcase this occurs on.  We only give up on

                  optimizing a load if it's going to be super expensive,

                  and you probably do *not* want to try to get better

                  answers in that case).<br>

                  ><br>

                  > As for updating when you remove stores, you

                  should simply be able to replace any loads the store

                  uses with getClobberingAccess(store) using RAUW.<br>

                  ><br>

                  > Under the covers, removeMemoryAccess calls RAUW

                  with the DefiningAccess.<br>

                  > We could change it to use

                  getClobberingMemoryAccess for loads, and

                  DefiningAccess for stores.<br>

                  ><br>

                  > 2. ah, okay.<br>

                  ><br>

                  ><br>

                </span>Okay, I get why just checking the defining access

                for loads is better (we get to skip the AA check).<br>

              </blockquote>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> For

                stores, we may be able to do something faster than

                calling getClobberingAccess(store).  We could instead

                walk up the store defining access chain and stop if we

                get to a point that dominates the earlier load or a

                clobbering write.  I'll have to time this to see if it

                makes a difference.  I guess it will depend on what

                percentage of the time the clobber cache has been thrown

                away.<br>

              </blockquote>

              <div><br>

              </div>

              <div>Yes.  You are basically designing a custom walker

                here, and that's great.</div>

              <div>If it matters, I would just make it a walker class

                and  </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>

                As for updating when removing stores: it seems like

                doing RAUW getClobberingAccess(store) is not optimal in

                some cases.  </blockquote>

              <div><br>

                Just out of curiosity do you have a real example of the

                below?  We work really hard to avoid generating that.</div>

              <div><br>

              </div>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">For

                example:<br>

                <br>

                  store @G1 ; 1 = MD(entry)<br>

                  store @G2 ; 2 = MD(1)<br>

                  store %p ; 3 = MD(2)<br>

                  load @G1 ; MU(3)<br>

                  load @G2  ; MU(3)<br>

              </blockquote>

              <div><br>

              </div>

              <div>These should not be MU(3), they should be MU(1) and

                MU(2), unless the store is *actually* a clobber for

                them. If the store is actually a clobber, i don't see

                how you could legally remove or move the store (since it

                conflicts with G1/G2) :)</div>

              <div><br>

              </div>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>

                If we remove 3 and RUAW getClobberingAccess(3) (=2) we

                get:<br>

                <br>

                  store @G1 ; 1 = MD(entry)<br>

                  store @G2 ; 2 = MD(1)<br>

                  load @G1 ; MU(2)<br>

                  load @G2  ; MU(2)</blockquote>

              <div><br>

              </div>

              <div>I agree that if you can get memoryssa to generate the

                first, you will get the second and it will be

                non-optimal.  If you can, and it's legal to move or

                remove the store, that seems like a super-rare case

                unless i'm missing something?<br>

              </div>

            </div>

          </div>

        </div>

      </blockquote>

      <br>

      Just to reply to this one particular point for now, here's a case

      where this can happen because of EarlyCSE removing a store that is

      simply storing back the value just loaded from the same memory

      location:<br>

      <br>

      <br>

      define void @test3(i32* %p) {<br>

      entry:<br>

      ; 1 = MemoryDef(liveOnEntry)<br>

        store i32 1, i32* @G1<br>

      ; 2 = MemoryDef(1)<br>

        store i32 2, i32* @G2<br>

      ; MemoryUse(2)<br>

        %xp = load i32, i32* %p<br>

      ; 3 = MemoryDef(2)<br>

        store i32 %xp, i32* %p<br>

      ; MemoryUse(3)<br>

        %x1 = load i32, i32* @G1<br>

      ; MemoryUse(3)<br>

        %x2 = load i32, i32* @G2<br>

      ; 4 = MemoryDef(3)<br>

        call void @foo(i32 %x1, i32 %x2, i32 %xp)<br>

        ret void<br>

      }<br>

      <br>

      Removing:   store i32 %xp, i32* %p<br>

      <br>

      define void @test3(i32* %p) {<br>

      entry:<br>

      ; 1 = MemoryDef(liveOnEntry)<br>

        store i32 1, i32* @G1<br>

      ; 2 = MemoryDef(1)<br>

        store i32 2, i32* @G2<br>

      ; MemoryUse(2)<br>

        %xp = load i32, i32* %p<br>

      ; MemoryUse(2)<br>

        %x1 = load i32, i32* @G1<br>

      ; MemoryUse(2)<br>

        %x2 = load i32, i32* @G2<br>

      ; 4 = MemoryDef(2)<br>

        call void @foo(i32 %x1, i32 %x2, i32 %xp)<br>

        ret void<br>

      }<br>

      <br>

    </blockquote>

    <br>

    ... and to push this case a bit further, it seems like it wouldn't

    be too hard to construct a case where removing this store would

    necessitate looking down through MemoryPhis in order to maintain the

    optimized-use invariant, possibly even leading to more MemoryPhi

    optimization?<br>

    <br>

    <blockquote

      cite="mid:0ced3f92-5798-78a2-6101-a60db9f8af20@codeaurora.org"

      type="cite"> <br>

      <blockquote

cite="mid:CAF4BwTXHQuj4_h1DwFsWqB8e+8pbC9ccbUCx1sZdywTg992DcA@mail.gmail.com"

        type="cite">

        <div dir="ltr">

          <div class="gmail_extra">

            <div class="gmail_quote">

              <div>If you have a testcase with a real example, that

                would be super-interesting to me.<br>

              </div>

              <div>Because if we can't get it to generate precise info

                in most if not all cases, it's  possibly not worth doing

                the work we are doing to optimize uses.</div>

              <div><br>

              </div>

              <div>I don't claim we make the perfect set of tradeoffs

                right now. I know for the use cases i cared about, it

                makes an IMHO good set of choices.</div>

              <div><br>

              </div>

              <div><br>

              </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br>

              </blockquote>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> but

                the load @G1 would be more precise if it was MU(1) (and

                the invariant that defining access == clobbering access

                would be broken).  Is this just a compile-time/precision

                trade-off?  </blockquote>

              <div><br>

              </div>

              <div>Kinda. The main problem is it affects everyone

                downstream *right now* (again, we could change this at

                the cost of having the walker mutate loads).</div>

              <div><br>

              </div>

              <div>Imagine something upstream of you generates imprecise

                info for loads.</div>

              <div>You use getClobberingAccess to work around it,

                because you want the best answer.</div>

              <div><br>

                Assume everything preserves MemorySSA.</div>

              <div>We still can't really cache this sanely with a

                side-cache:  either it gets blown away, or takes up too

                much space.</div>

              <div>Caching the result of every load without rewriting is

                another O(N) space factor (we did it before, and george

                had to undo it :P).</div>

              <div><br>

              </div>

              <div>So now you wasted a bunch of time looking up the real

                clobbering result for the load.</div>

              <div>So will the next pass, and the pass after that, and

                ...</div>

              <div><br>

              </div>

              <div>Stores get pushed *upwards* a lot less than loads, so

                calling getClobberingMemoryAccess on them is more rare,

                and caching them is more sane, and "misses" are lower

                cost.</div>

              <div><br>

              </div>

              <div>For loads, we can work around having to have a

                separate cache by rewriting the defining access in the

                main walker as a cache (and marking them as optimized

                with a single bit or something, so we know not to waste

                time doing it again until the defining access gets reset

                again), but then the walker is mutating the MemorySSA,

                which is "not ideal" if we can avoid it.</div>

              <div><br>

              </div>

              <div>If we can't, such is life, i'm basically trying to

                kick this can as far down the road as i can before we

                use that solution :)</div>

              <div><br>

              </div>

              <div>But y'all are the ones doing a lot of the work at

                this point, so let me know if you think we should stop

                trying to maintain these invariants.</div>

              <div><br>

              </div>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">Maybe

                for that reason it makes more sense to let the client

                decide if they want to do the simple RAUW

                getClobberingAccess(Store) or optimize each use

                separately?<br>

                <br>

              </blockquote>

              <div><br>

                If we can make it come out optimal for everyone, we

                should do that.</div>

              <div>Otherwise, one client screws up another :)<br>

              </div>

              <div><br>

              </div>

              <div>If we can't, yeah, sure, then we start making

                tradeoffs.</div>

              <div><br>

              </div>

              <div><br>

              </div>

            </div>

          </div>

        </div>

      </blockquote>

      <br>

      <pre class="moz-signature" cols="72">-- 

Geoff Berry

Employee of Qualcomm Datacenter Technologies, Inc.

 Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

llvm-commits mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Geoff Berry

Employee of Qualcomm Datacenter Technologies, Inc.

 Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.</pre>

  </body>

</html>