<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">Andy - If you're not already following

      this closely, please start.  We've gotten into fairly fundamental

      questions of what a patchpoint does.  <br>

      <br>

      Filip, <br>

      <br>

      I think you've hit the nail on the head.  What I'm thinking of as

      being patchpoints are not what you think they are.  Part of that

      is that I've got a local change which adds a very similar

      construction (called "statepoints" for the moment), but I was

      trying to keep that separate.  That also includes a lot of GC

      semantics which are not under discussion currently.  My apologies

      if that experience bled over into this conversation and made

      things more confusing.  <br>

      <br>

      I will note that the documentation for patchpoint say explicitly

      the following:<br>

      "The ‘<tt class="docutils literal"><span class="pre">llvm.experimental.patchpoint.*</span></tt>‘

      intrinsics creates a function

      call to the specified <tt class="docutils literal"><span

          class="pre"><target></span></tt> and records the

      location of specified

      values in the stack map."<br>

      <br>

      My reading has always been that a patchpoint *that isn't patched*

      is simply a call with a stackmap associated with it.  To my

      reading, this can (and did, and does) indicate my proposed usage

      would be legal.  <br>

      <br>

      I will agree that I've confused the topic badly on the

      optimization front.  My "statepoint" isn't patchable, so a lot

      more optimizations are legal.  Sorry about that.  To restate what

      I think you've been saying all along, the optimizer can't make

      assumptions about what function is called by a patchpoint because

      that might change based on later patching.  Is this the key point

      you've been trying to make?<br>

      <br>

      I'm not objecting to separating "my patchpoint" from "your

      patchpoint".  Let's just hammer out the semantics of each first. 

      :)<br>

      <br>

      Again, longer response to follow in a day or so. :)<br>

      <br>

      Philip<br>

      <br>

      On 04/30/2014 10:09 PM, Filip Pizlo wrote:<br>

    </div>

    <blockquote cite="mid:etPan.5361d71d.1d4ed43b.172db@dethklok.local"

      type="cite">

      <style>body{font-family:Helvetica,Arial;font-size:13px}</style>

      <div id="bloop_customfont"

        style="font-family:Helvetica,Arial;font-size:13px; color:

        rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br>

      </div>

      <br>

      <p style="color:#000;">On April 30, 2014 at 9:06:20 PM, Philip

        Reames (<a moz-do-not-send="true"

          href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)

        wrote:</p>

      <div>

        <div>

          <blockquote type="cite" class="clean_bq" style="color: rgb(0,

            0, 0); font-family: Helvetica, Arial; font-size: 13px;

            font-style: normal; font-variant: normal; font-weight:

            normal; letter-spacing: normal; line-height: normal;

            orphans: auto; text-align: start; text-indent: 0px;

            text-transform: none; white-space: normal; widows: auto;

            word-spacing: 0px; -webkit-text-stroke-width: 0px;

            background-color: rgb(255, 255, 255);"><span>

              <div text="#000000" bgcolor="#FFFFFF">

                <div>

                  <div class="moz-cite-prefix">On 04/29/2014 12:39 PM,

                    Filip Pizlo wrote:<br>

                  </div>

                  <blockquote

                    cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"

                    type="cite">On April 29, 2014 at 11:27:06 AM, Philip

                    Reames (<a moz-do-not-send="true"

                      href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)

                    wrote:

                    <div>

                      <blockquote type="cite" class="clean_bq"

                        style="color: rgb(0, 0, 0); font-family:

                        Helvetica, Arial; font-size: 13px; font-style:

                        normal; font-variant: normal; font-weight:

                        normal; letter-spacing: normal; line-height:

                        normal; orphans: auto; text-align: start;

                        text-indent: 0px; text-transform: none;

                        white-space: normal; widows: auto; word-spacing:

                        0px; -webkit-text-stroke-width: 0px;

                        background-color: rgb(255, 255, 255);">

                        <div bgcolor="#FFFFFF" text="#000000">

                          <div>

                            <div class="moz-cite-prefix"><span>On

                                04/29/2014 10:44 AM, Filip Pizlo wrote:<br>

                              </span></div>

                            <blockquote

                              cite="mid:etPan.535fe4f0.140e0f76.172db@dethklok.local"

                              type="cite">

                              <div id="bloop_customfont"

                                style="font-family: Helvetica, Arial;

                                font-size: 13px; color: rgb(0, 0, 0);

                                margin: 0px;"><span>LD;DR: Your desire

                                  to use trapping on x86 only further

                                  convinces me that Michael's proposed

                                  intrinsics are the best way to go.</span></div>

                            </blockquote>

                            <span>I'm still not convinced, but am not

                              going to actively oppose it either.  I'm

                              leery of designing a solution with major

                              assumptions we don't have data to backup. <br>

                              <br>

                              I worry your assumptions about

                              deoptimization are potentially unsound. 

                              But I don't have data to actually show

                              this (yet).</span></div>

                        </div>

                      </blockquote>

                    </div>

                    <p>I *think* I may have been unclear about my

                      assumptions; in particular, my claims with respect

                      to deoptimization are probably more subtle than

                      they appeared.  WebKit can use LLVM and it has

                      divisions and we do all possible

                      deoptimization/profiling/etc tricks for it, so

                      this is grounded in experience.  Forgive me if the

                      rest of this e-mail contains a lecture on things

                      that are obvious - I'll try to err on the side of

                      clarity and completeness since this discussion is

                      sufficiently dense that we run the risk of talking

                      cross-purposes unless some baseline assumptions

                      are established.</p>

                  </blockquote>

                  I think we're using the same terminology, but with

                  slightly different sets of assumptions.  I'll point

                  this out below where relevant. <br>

                  <br>

                  Also, thanks for taking the time to expand.  It help

                  clarify the discussion quite a bit. </div>

              </div>

            </span></blockquote>

        </div>

        <p>I think we may be converging to an understanding of what you

          want versus what I want, and I think that there are some

          points - possibly unrelated to division - that are worth

          clarifying.  I think that the main difference is that when I

          say "patchpoint", I am referring to a concrete intrinsic with

          specific semantics that cannot change without breaking WebKit,

          while you are using the term to refer to a broad concept, or

          rather, a class of as-yet-unimplemented intrinsics that share

          some of the same features with patchpoints but otherwise have

          incompatible semantics.</p>

        <p>Also, when I say that you wouldn't want to use the existing

          patchpoint to do your trapping deopt, what I mean is that the

          performance of doing this would suck for reasons that are not

          related to deoptimization or trapping.  I'm not claiming that

          deoptimization performs poorly (trust me, I know better) or

          that trapping to deoptimize is bad (I've done this many, many

          times and I know better).  I'm saying that with the current

          patchpoint intrinsics in LLVM, as they are currently specified

          and implemented, doing it would be a bad idea because you'd

          have to compromise a bunch of other optimizations to achieve

          it.</p>

        <p>You have essentially described new intrinsics that would make

          this less of a bad idea and I am interested in your

          intrinsics, so I'll try to both respond with why patchpoints

          don't currently give you what you want (and why simply

          changing patchpoint semantics would be evil) and I'll also try

          to comment on what I think of the intrinsic that you're

          effectively proposing.  Long story short, I think you should

          formally propose your intrinsic even if it's not completely

          fleshed out.  I think that it's an interesting capability and

          in its most basic form, it is a simple variation of the

          current patchpoint/stackmap intrinsics.</p>

        <div>

          <blockquote type="cite" class="clean_bq" style="color: rgb(0,

            0, 0); font-family: Helvetica, Arial; font-size: 13px;

            font-style: normal; font-variant: normal; font-weight:

            normal; letter-spacing: normal; line-height: normal;

            orphans: auto; text-align: start; text-indent: 0px;

            text-transform: none; white-space: normal; widows: auto;

            word-spacing: 0px; -webkit-text-stroke-width: 0px;

            background-color: rgb(255, 255, 255);"><span>

              <div text="#000000" bgcolor="#FFFFFF">

                <div>

                  <blockquote

                    cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"

                    type="cite">

                    <div>

                      <div>

                        <blockquote type="cite" class="clean_bq"

                          style="color: rgb(0, 0, 0); font-family:

                          Helvetica, Arial; font-size: 13px; font-style:

                          normal; font-variant: normal; font-weight:

                          normal; letter-spacing: normal; line-height:

                          normal; orphans: auto; text-align: start;

                          text-indent: 0px; text-transform: none;

                          white-space: normal; widows: auto;

                          word-spacing: 0px; -webkit-text-stroke-width:

                          0px; background-color: rgb(255, 255, 255);">

                          <div bgcolor="#FFFFFF" text="#000000">

                            <div><span><br

                                  class="Apple-interchange-newline">

                                <br>

                                <br>

                              </span>

                              <blockquote

                                cite="mid:etPan.535fe4f0.140e0f76.172db@dethklok.local"

                                type="cite"><span><br>

                                </span>

                                <p style="color: rgb(0, 0, 0);"><span>On

                                    April 29, 2014 at 10:09:49 AM,

                                    Philip Reames (<a

                                      moz-do-not-send="true"

                                      href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)

                                    wrote:</span></p>

                                <div>

                                  <blockquote type="cite"

                                    class="clean_bq" style="color:

                                    rgb(0, 0, 0); font-family:

                                    Helvetica, Arial; font-size: 13px;

                                    font-style: normal; font-variant:

                                    normal; font-weight: normal;

                                    letter-spacing: normal; line-height:

                                    normal; orphans: auto; text-align:

                                    start; text-indent: 0px;

                                    text-transform: none; white-space:

                                    normal; widows: auto; word-spacing:

                                    0px; -webkit-text-stroke-width: 0px;

                                    background-color: rgb(255, 255,

                                    255);">

                                    <div bgcolor="#FFFFFF"

                                      text="#000000">

                                      <div><span><span>As the discussion

                                            has progressed and I've

                                            spent more time thinking

                                            about the topic, I find

                                            myself less and less

                                            enthused about the current

                                            proposal.  I'm in full

                                            support of having idiomatic

                                            ways to express safe

                                            division, but I'm starting

                                            to doubt that using an

                                            intrinsic is the right way

                                            at the moment.<br>

                                            <br>

                                            One case I find myself

                                            thinking about is how one

                                            would combine profiling

                                            information and implicit

                                            div-by-zero/overflow checks

                                            with this proposal.  I don't

                                            really see a clean way. 

                                            Ideally, for a "safe div"

                                            which never has the

                                            exceptional paths taken,

                                            you'd like to completely do

                                            away with the control flow

                                            entirely.  (And rely on

                                            hardware traps w/exceptions

                                            instead.)  I don't really

                                            see a way to represent that

                                            type of construct given the

                                            current proposal. </span></span></div>

                                    </div>

                                  </blockquote>

                                </div>

                                <p>This is a deeper problem and to solve

                                  it you'd need a solution to trapping

                                  in general.  Let's consider the case

                                  of Java.  A Java program may want to

                                  catch the arithmetic exception due to

                                  divide by zero.  How would you do this

                                  with a trap in LLVM IR?  Spill all

                                  state that is live at the catch?  Use

                                  a patchpoint for the entire division

                                  instruction?</p>

                              </blockquote>

                              We'd likely use something similar to a

                              patchpoint.  You'd need the "abstract vm

                              state" (which is not the compiled frame

                              necessarily) available at the div

                              instruction.  You could then re-enter the

                              interpreter at the specified index (part

                              of the vm state).  We have all most of

                              these mechanisms in place.  Ideally, you'd

                              trigger a recompile and otherwise ensure

                              re-entry into compiled code at the soonest

                              possible moment. <br>

                              <br>

                              This requires a lot of runtime support,

                              but we already have most of it implemented

                              for another compiler.  From our

                              perspective, the runtime requirements are

                              not a major blocker. </div>

                          </div>

                        </blockquote>

                      </div>

                      <p>Right, you'll use a patchpoint.  That's way

                        more expensive than using a safe division

                        intrinsic with branches, because it's opaque to

                        the optimizer.</p>

                    </div>

                  </blockquote>

                  This statement is true at the moment, but it shouldn't

                  be.  I think this is our fundamental difference in

                  approach. <br>

                  <br>

                  You should be able to write something like:<br>

                  i32 %res = invoke patchpoint (... x86_trapping_divide,

                  a, b) normal_dest invoke_dest<br>

                  <br>

                  normal_dest:<br>

                    ;; use %res<br>

                  invoke_dest:<br>

                    landingpad<br>

                    ;; dispatch edge cases<br>

                    ;; this could be unreachable code if you deopt this

                  frame in the trap handler and jump directly to an

                  interpreter or other bit of code</div>

              </div>

            </span></blockquote>

        </div>

      </div>

      <p>I see.  It sounds like you want a generalization of the

        "div.with.stackmap" that I thought you wanted - you want to be

        able to wrap anything in a stackmap.</p>

      <p>The current patchpoint intrinsic does not do this, and you run

        the risk of breaking existing semantics if you changed this.

         You'd probably break WebKit, which treats the call target of

        the patchpoint as nothing more than a quirk - we always pass

        null.  Also, the current patchpoint treats the callee as an i8*

        if I remember right and it would be super weird if all LLVM

        phases had to interpret this i8* by unwrapping a possible

        bitcast to get to a declared function that may be an intrinsic.

         Yuck!  Basically, the call target of existing patchpoints is

        meant to be a kind of convenience feature rather than the core

        of the mechanism.</p>

      <p>I agree in principle that the intrinsic that you want would be

        a useful intrinsic.  But let's not call it a patchpoint for the

        purposes of this discussion, and let's not confuse the

        discussion by claiming (incorrectly) that the existing

        patchpoint facility gives you what you want.  It doesn't:

        patchpoints are designed to make the call target opaque (hence

        the use of i8*) and there shouldn't be a correlation between

        what the patchpoint does at run-time and what the called

        function would have done.  You could make the call target be

        null (like WebKit does) and the patchpoint should still mean

        "this code can do anything" because the expectation is that the

        client JIT will patch over it anyway.</p>

      <p>Also, "patchpoint" would probably not be the right term for the

        intrinsic that you want.  I think that what you want is

        "call.with.stackmap".  Or maybe "stackmap.wrapper".  Or just

        "stackmap" - I'd be OK, in principle, with changing the name of

        the current "stackmap" intrinsic to something that reflects the

        fact that it's less of a stackmap than what you want.</p>

      <p>To summarize.  A patchpoint's main purpose is that you can

        patch it with arbitrary code.  The current "stackmap" means that

        you can patch it with arbitrary code and that patching may be

        destructive to a shadow of machine code bytes, so it's really

        just like patchpoints - we could change its name to

        "patchpoint.shadow" for example.</p>

      <p>If you were to propose such a stackmap intrinsic, then I think

        there could be some ways of doing it that wouldn't be too

        terrible.  Basically you want something that is like a

        patchpoint in that it reports a stackmap via a side channel, but

        unlike patchpoints, it doesn't allow arbitrary patching -

        instead the optimizer should be allowed to assume that the code

        within the patchpoint will always do the same thing that the

        call target would have done.  There are downsides to truly doing

        this.  For example, to make division efficient with such an

        intrinsic, you'd have to do something that is somewhat worse

        than just recognizing intrinsics in the optimizer - you'd have

        to first recognize a call to your "stackmap wrapper" intrinsic

        and then observe that its call target argument is in turn

        another intrinsic.  To me personally, that's kind of yucky, but

        I won't deny that it could be useful.</p>

      <p>As to the use of invoke: I don't believe that the use of invoke

        versus my suggested "branch on a trap predicate" idea are

        different in any truly meaningful way.  I buy that either would

        work.</p>

      <div>

        <div>

          <blockquote type="cite" class="clean_bq" style="color: rgb(0,

            0, 0); font-family: Helvetica, Arial; font-size: 13px;

            font-style: normal; font-variant: normal; font-weight:

            normal; letter-spacing: normal; line-height: normal;

            orphans: auto; text-align: start; text-indent: 0px;

            text-transform: none; white-space: normal; widows: auto;

            word-spacing: 0px; -webkit-text-stroke-width: 0px;

            background-color: rgb(255, 255, 255);"><span>

              <div text="#000000" bgcolor="#FFFFFF">

                <div><br>

                  <br>

                  A patchpoint should not require any excess spilling. 

                  If values are live in registers, that should be

                  reflected in the stack map.  (I do not know if this is

                  the case for patchpoint at the moment or not.)</div>

              </div>

            </span></blockquote>

        </div>

        <p>Patchpoints do not require spilling.</p>

        <p>My point was that with existing patchpoints, you *either* use

          a patchpoint for the entire division which makes the division

          opaque to the optimizer - because a patchpoint means "this

          code can do anything" - *or* you could spill stuff to the

          stack prior to your trapping division intrinsic, since

          spilling is something that you could do as a workaround if you

          didn't have a patchpoint.</p>

        <p>The reason why I brought up spilling at all is that I suspect

          that spilling all state to the stack might be cheaper - for

          some systems - than turning the division into a patchpoint.

           Turning the division into a patchpoint is horrendously brutal

          - the patchpoint looks like it clobbers the heap (which a

          division doesn't do), has to execute (a division is an obvious

          DCE candidate), cannot be hoisted (hoisting divisions is

          awesome), etc.  Perhaps most importantly, though, a patchpoint

          doesn't tell LLVM that you're *doing a division* - so all

          constant folding, strenght reduction, and algebraic reasoning

          flies out the window.  On the other hand, spilling all state

          to the stack is an arguably sound and performant solution to a

          lot of VM problems.  I've seen JVM implementations that ensure

          that there is always a copy of state on the stack at some

          critical points, just because it makes loads of stuff simpler

          (debugging, profiling, GC, and of course deopt).  I personally

          prefer to stay away from such a strategy because it's not

          free.</p>

        <p>On the other hand, branches can be cheap.  A branch on a

          divide is cheaper than not being able to optimize the divide.</p>

        <div>

          <div>

            <blockquote type="cite" class="clean_bq" style="color:

              rgb(0, 0, 0); font-family: Helvetica, Arial; font-size:

              13px; font-style: normal; font-variant: normal;

              font-weight: normal; letter-spacing: normal; line-height:

              normal; orphans: auto; text-align: start; text-indent:

              0px; text-transform: none; white-space: normal; widows:

              auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;

              background-color: rgb(255, 255, 255);"><span>

                <div text="#000000" bgcolor="#FFFFFF">

                  <div><br>

                    <br>

                    The Value called by a patchpoint should participate

                    in optimization normally. <span

                      class="Apple-converted-space"> </span></div>

                </div>

              </span></blockquote>

          </div>

          <p>I agree that you could have a different intrinsic that

            behaves like this.</p>

          <div>

            <div>

              <blockquote type="cite" class="clean_bq" style="color:

                rgb(0, 0, 0); font-family: Helvetica, Arial; font-size:

                13px; font-style: normal; font-variant: normal;

                font-weight: normal; letter-spacing: normal;

                line-height: normal; orphans: auto; text-align: start;

                text-indent: 0px; text-transform: none; white-space:

                normal; widows: auto; word-spacing: 0px;

                -webkit-text-stroke-width: 0px; background-color:

                rgb(255, 255, 255);"><span>

                  <div text="#000000" bgcolor="#FFFFFF">

                    <div>We really want the patchpoint part of the call

                      to be supplemental.  It should still be a call. 

                      It should be constant propagated, transformed,

                      etc..  This is not the case currently.  I've got a

                      couple of off the wall ideas for improving the

                      current status, but I'll agree this is a hardish

                      problem. <br>

                      <br>

                      It should be legal to use a patchpoint in an

                      invoke.  It's an ABI issue of how the invoke path

                      gets invoked.  (i.e. side tables for the runtime

                      to lookup, etc..)  This is not possible today, and

                      probably requires a fair amount of work.  Some of

                      it, I've already done and will be sharing

                      shortly.  Other parts, I haven't even thought

                      about. </div>

                  </div>

                </span></blockquote>

            </div>

            <p>Right, it's significantly more complex than either the

              existing patchpoints or Michael's proposed safe.div.</p>

            <div>

              <div>

                <blockquote type="cite" class="clean_bq" style="color:

                  rgb(0, 0, 0); font-family: Helvetica, Arial;

                  font-size: 13px; font-style: normal; font-variant:

                  normal; font-weight: normal; letter-spacing: normal;

                  line-height: normal; orphans: auto; text-align: start;

                  text-indent: 0px; text-transform: none; white-space:

                  normal; widows: auto; word-spacing: 0px;

                  -webkit-text-stroke-width: 0px; background-color:

                  rgb(255, 255, 255);"><span>

                    <div text="#000000" bgcolor="#FFFFFF">

                      <div><br>

                        <br>

                        If you didn't want to use the trapping

                        semantics, you'd insert dedicated control flow

                        _before_ the divide.  This would allow normal

                        optimization of the control flow. <br>

                        <br>

                        Notes:<br>

                        1) This might require a new PATCHPOINT pseudo op

                        in the backend.  Haven't thought much about that

                        yet.<br>

                        2) I *think* your current intrinsic could be

                        translated into something like this.  (Leaving

                        aside the question of where the deopt state

                        comes from.)  In fact, the more I look at this,

                        the less difference I actually see between the

                        approaches. <br>

                        <br>

                        <br>

                        <blockquote

                          cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"

                          type="cite">

                          <div>

                            <div>

                              <div>

                                <blockquote type="cite" class="clean_bq"

                                  style="color: rgb(0, 0, 0);

                                  font-family: Helvetica, Arial;

                                  font-size: 13px; font-style: normal;

                                  font-variant: normal; font-weight:

                                  normal; letter-spacing: normal;

                                  line-height: normal; orphans: auto;

                                  text-align: start; text-indent: 0px;

                                  text-transform: none; white-space:

                                  normal; widows: auto; word-spacing:

                                  0px; -webkit-text-stroke-width: 0px;

                                  background-color: rgb(255, 255, 255);">

                                  <div bgcolor="#FFFFFF" text="#000000">

                                    <div>

                                      <blockquote

                                        cite="mid:etPan.535fe4f0.140e0f76.172db@dethklok.local"

                                        type="cite">

                                        <p><span><br

                                              class="Apple-interchange-newline">

                                            In a lot of languages, a

                                            divide produces some result

                                            even in the exceptional case

                                            and this result requires

                                            effectively deoptimizing

                                            since the resut won't be the

                                            one you would have predicted

                                            (double instead of int, or

                                            BigInt instead of small

                                            int), which sort of means

                                            that if the CPU exception

                                            occurs you have to be able

                                            to reconstruct all state.  A

                                            patchpoint could do this,

                                            and so could spilling all

                                            state to the stack before

                                            the divide - but both are

                                            very heavy hammers that are

                                            sure to be more expensive

                                            than just doing a branch.</span></p>

                                      </blockquote>

                                      <span>This isn't necessarily as

                                        expensive as you might believe. 

                                        I'd recommend reading the Graal

                                        project papers on this topic.<br>

                                        <br>

                                        Whether deopt or branching is

                                        more profitable *in this case*,

                                        I can't easily say.  I'm not yet

                                        to the point of being able to

                                        run that experiment.  We can

                                        argue about what "should" be

                                        better all we want, but real

                                        performance data is the only way

                                        to truly know. </span></div>

                                  </div>

                                </blockquote>

                              </div>

                              <p>My point may have been confusing.  I

                                know that deoptimization is cheap and

                                WebKit uses it everywhere, including

                                division corner cases, if profiling

                                tells us that it's profitable to do so

                                (which it does, in the common case).

                                 WebKit is a heavy user of

                                deoptimization in general, so you don't

                                need to convince me that it's worth it.</p>

                            </div>

                          </div>

                        </blockquote>

                        Acknowledged. <br>

                        <blockquote

                          cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"

                          type="cite">

                          <div>

                            <div>

                              <p>Note that I want *both* deopt *and*

                                branching, because in this case, a

                                branch is the fastest overall way of

                                detecting when to deopt.  In the future,

                                I will want to implement the deopt in

                                terms of branching, and when we do this,

                                I believe that the most sound and

                                performat approach would be using

                                Michael's intrinsics.  This is subtle

                                and I'll try to explain why it's the

                                case.</p>

                              <p>The point is that you wouldn't want to

                                do deoptimization by spilling state on

                                the main path or by using a patchpoint

                                for the main path of the division.</p>

                            </div>

                          </div>

                        </blockquote>

                        This is the main point I disagree with.  I don't

                        believe that having a patchpoint on the main

                        path should be any more expensive then the

                        original call.  (see above)</div>

                    </div>

                  </span></blockquote>

              </div>

              <p>The reason why the patchpoint is expensive is that if

                you use a patchpoint to implement a division then the

                optimizer won't be allowed to assume that it's a

                division, because the whole point of "patchpoint" is to

                tell the optimizer to piss off.</p>

              <div>

                <div>

                  <blockquote type="cite" class="clean_bq" style="color:

                    rgb(0, 0, 0); font-family: Helvetica, Arial;

                    font-size: 13px; font-style: normal; font-variant:

                    normal; font-weight: normal; letter-spacing: normal;

                    line-height: normal; orphans: auto; text-align:

                    start; text-indent: 0px; text-transform: none;

                    white-space: normal; widows: auto; word-spacing:

                    0px; -webkit-text-stroke-width: 0px;

                    background-color: rgb(255, 255, 255);"><span>

                      <div text="#000000" bgcolor="#FFFFFF">

                        <div><br>

                          <br>

                          Worth noting explicitly: I'm assuming that all

                          of your deopt state would already be available

                          for other purposes in nearby code.  It's on

                          the stack or in registers.  I'm assuming that

                          by adding the deopt point, you are not

                          radically changing the set of computations

                          which need to be done.  If that's not the

                          case, you should avoid deopt and instead just

                          inline the slow paths with explicit checks. </div>

                      </div>

                    </span></blockquote>

                </div>

                <p>Yes, of course it is.  That's not the issue.</p>

                <div>

                  <div>

                    <blockquote type="cite" class="clean_bq"

                      style="color: rgb(0, 0, 0); font-family:

                      Helvetica, Arial; font-size: 13px; font-style:

                      normal; font-variant: normal; font-weight: normal;

                      letter-spacing: normal; line-height: normal;

                      orphans: auto; text-align: start; text-indent:

                      0px; text-transform: none; white-space: normal;

                      widows: auto; word-spacing: 0px;

                      -webkit-text-stroke-width: 0px; background-color:

                      rgb(255, 255, 255);"><span>

                        <div text="#000000" bgcolor="#FFFFFF">

                          <div><br>

                            <br>

                            I'll note that given your assumptions about

                            the cost of a patchpoint, the rest of your

                            position makes a lot more sense.  :)  As I

                            spelled out above, I believe this cost is

                            not fundamental. <br>

                            <blockquote

                              cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"

                              type="cite">

                              <div>

                                <div>

                                  <p>You don't want the common path of

                                    executing the division to involve a

                                    patchpoint instruction, although

                                    using a patchpoint or stackmap to

                                    implement deoptimization on the

                                    failing path is great:</p>

                                  <p><b>Good:</b><span

                                      class="Apple-converted-space"> </span>if

                                    (division would fail) { call

                                    @patchpoint(all of my state) } else

                                    { result = a / b }</p>

                                </div>

                              </div>

                            </blockquote>

                            Given your cost assumptions, I'd agree. </div>

                        </div>

                      </span></blockquote>

                  </div>

                  <p>Not my cost assumptions.  The reason why this is

                    better is that the division is expressed in LLVM IR

                    so that LLVM can do useful things to it - like

                    eliminate it, for example.</p>

                  <div>

                    <div>

                      <blockquote type="cite" class="clean_bq"

                        style="color: rgb(0, 0, 0); font-family:

                        Helvetica, Arial; font-size: 13px; font-style:

                        normal; font-variant: normal; font-weight:

                        normal; letter-spacing: normal; line-height:

                        normal; orphans: auto; text-align: start;

                        text-indent: 0px; text-transform: none;

                        white-space: normal; widows: auto; word-spacing:

                        0px; -webkit-text-stroke-width: 0px;

                        background-color: rgb(255, 255, 255);"><span>

                          <div text="#000000" bgcolor="#FFFFFF">

                            <div>

                              <blockquote

                                cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"

                                type="cite">

                                <div>

                                  <div>

                                    <p><b><br

                                          class="Apple-interchange-newline">

                                        Bad:</b><span

                                        class="Apple-converted-space"> </span>call

                                      @patchpoint(all of my state) //

                                      patch with a divide instruction -

                                      bad because the optimizer has no

                                      clue what you're doing and assumes

                                      the very worst</p>

                                  </div>

                                </div>

                              </blockquote>

                              Yuck.  Agreed. </div>

                          </div>

                        </span></blockquote>

                    </div>

                    <p>To be clear, this is what you're proposing -

                      except that you're assuming that LLVM will know

                      that you've patched a division because you're

                      expecting the call target to have semantic

                      meaning.  Or, rather, you're expecting that you

                      can make the contents of the patchpoint be a

                      division by having the call target be a division

                      intrinsic.  In the current implementation and as

                      it is currently specified, the call target has no

                      meaning and so you get the yuck that I'm showing.</p>

                    <div>

                      <div>

                        <blockquote type="cite" class="clean_bq"

                          style="color: rgb(0, 0, 0); font-family:

                          Helvetica, Arial; font-size: 13px; font-style:

                          normal; font-variant: normal; font-weight:

                          normal; letter-spacing: normal; line-height:

                          normal; orphans: auto; text-align: start;

                          text-indent: 0px; text-transform: none;

                          white-space: normal; widows: auto;

                          word-spacing: 0px; -webkit-text-stroke-width:

                          0px; background-color: rgb(255, 255, 255);"><span>

                            <div text="#000000" bgcolor="#FFFFFF">

                              <div>

                                <blockquote

                                  cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"

                                  type="cite">

                                  <div>

                                    <div>

                                      <p><b><br

                                            class="Apple-interchange-newline">

                                          Worse:</b><span

                                          class="Apple-converted-space"> </span>spill

                                        all state to the stack; call

                                        @trapping.div(a, b) // the

                                        spills will hurt you far more

                                        than a branch, so this should be

                                        avoided</p>

                                    </div>

                                  </div>

                                </blockquote>

                                I'm assuming this is an explicit spill

                                rather than simply recording a stack map

                                *at the div*.  If so, agreed. <br>

                                <blockquote

                                  cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"

                                  type="cite">

                                  <div>

                                    <div>

                                      <p>I suppose we could imagine a

                                        fourth option that involves a

                                        patchpoint to pick up the state

                                        and a trapping divide

                                        instrinsic.  But a trapping

                                        divide intrinsic alone is not

                                        enough.  Consider this:</p>

                                      <p>result = call @trapping.div(a,

                                        b); call @stackmap(all of my

                                        state)</p>

                                      <p>As soon as these are separate

                                        instructions, you have no

                                        guarantees that the state that

                                        the stackmap reports is sound

                                        for the point at which the div

                                        would trap. <br>

                                      </p>

                                    </div>

                                  </div>

                                </blockquote>

                                This is the closest to what I'd propose,

                                except that the two calls would be

                                merged into a single patchpoint.  Isn't

                                the entire point of a patchpoint to

                                record the stack map for a call? <span

                                  class="Apple-converted-space"> </span></div>

                            </div>

                          </span></blockquote>

                      </div>

                      <p>No.  It would be bad if that's what the

                        documentation says.  That's not at all how

                        WebKit uses it or probably any IC client would

                        use it.</p>

                      <p>Patchpoints are designed to be inline assembly

                        on steroids.  They're there to allow the client

                        JIT to tell LLVM to piss off.</p>

                      <div>

                        <div>

                          <blockquote type="cite" class="clean_bq"

                            style="color: rgb(0, 0, 0); font-family:

                            Helvetica, Arial; font-size: 13px;

                            font-style: normal; font-variant: normal;

                            font-weight: normal; letter-spacing: normal;

                            line-height: normal; orphans: auto;

                            text-align: start; text-indent: 0px;

                            text-transform: none; white-space: normal;

                            widows: auto; word-spacing: 0px;

                            -webkit-text-stroke-width: 0px;

                            background-color: rgb(255, 255, 255);"><span>

                              <div text="#000000" bgcolor="#FFFFFF">

                                <div>(Well, ignoring the actual patching

                                  part..)  Why not write this as:<br>

                                  patchpoint(..., trapping.div, a, b);<br>

                                  <br>

                                  Is there something I'm missing here?<br>

                                  <br>

                                  Just to note: I fully agree that the

                                  two call proposal is unsound and

                                  should be avoided. <br>

                                  <br>

                                  <blockquote

                                    cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"

                                    type="cite">

                                    <div>

                                      <div>

                                        <p>So, the division itself

                                          shouldn't be a trapping

                                          instruction and instead you

                                          want to detect the bad case

                                          with a branch.</p>

                                        <p>To be clear:</p>

                                        <p>- Whether you use

                                          deoptimization for division or

                                          anything else - like WebKit

                                          has done since before any of

                                          the Graal papers were written

                                          - is mostly unrelated to how

                                          you represent the division,

                                          unless you wanted to add a new

                                          intrinsic that is like a

                                          trapping-division-with-stackmap:</p>

                                        <p>result = call

                                          @trapping.div.with.stackmap(a,

                                          b, ... all of my state ...)</p>

                                        <p>Now, maybe you do want such

                                          an intrinsic, in which case

                                          you should propose it! <br>

                                        </p>

                                      </div>

                                    </div>

                                  </blockquote>

                                  Given what we already have with

                                  patchpoints, I don't think a merged

                                  intrinsic is necessary.  (See above). 

                                  I believe we have all the parts to

                                  build this solution, and that - in

                                  theory - they should compose neatly.<br>

                                  <br>

                                  p.s. The bits I was referring to was

                                  not deopt per se.  It was particularly

                                  which set of deopt state you used for

                                  each deopt point.  That's a bit of

                                  tangent for the rest of the discussion

                                  now though. <br>

                                  <blockquote

                                    cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"

                                    type="cite">

                                    <div>

                                      <div>

                                        <p>The reason why I haven't

                                          proposed it is that I think

                                          that long-term, the currently

                                          proposed intrinsics are a

                                          better path to getting the

                                          trapping optimizations.  See

                                          my previous mail, where I show

                                          how we could tell LLVM what

                                          the failing path is (which may

                                          have deoptimization code that

                                          uses a stackmap or whatever),

                                          what the trapping predicate is

                                          (it comes from the safe.div

                                          intrinsic), and the fact that

                                          trapping is wise (branch

                                          weights).</p>

                                        <p>- If you want to do the

                                          deoptimization with a trap,

                                          then your only choice

                                          currently is to use a

                                          patchpoint for the main path

                                          of the division.  This will be

                                          slower than using a branch to

                                          an OSR exit basic block,

                                          because you're making the

                                          division itself opaque to the

                                          optimizer (bad) just to get

                                          rid of a branch (which was

                                          probably cheap to begin with).</p>

                                        <p>Hence, what you want to do -

                                          one way or another, regardless

                                          of whether this proposed

                                          intrinsic is added - is to

                                          branch on the corner case

                                          condition, and have the slow

                                          case of the branch go to a

                                          basic block that deoptimizes.

                                           Unless of course you have

                                          profiling that says that the

                                          case does happen often, in

                                          which case you can have that

                                          basic block handle the corner

                                          case inline without leaving

                                          optimized code (FWIW, we do

                                          have such paths in WebKit and

                                          they are useful).</p>

                                        <p>So the question for me is

                                          whether the branching involves

                                          explicit control flow or is

                                          hidden inside an intrinsic.  I

                                          prefer for it to be within an

                                          intrinsic because it:</p>

                                        <p>- allows the optimizer to do

                                          more interesting things in the

                                          common cases, like hoisting

                                          the entire division.</p>

                                        <p>- will give us a clearer path

                                          for implementing trapping

                                          optimizations in the future.</p>

                                        <p>- is an immediate win on ARM.</p>

                                        <p>I'd be curious to hear what

                                          specific idea you have about

                                          how to implement trap-based

                                          deoptimization with your

                                          trapping division intrinsic

                                          for x86 - maybe it's different

                                          from the two "bad" idioms I

                                          showed above.</p>

                                      </div>

                                    </div>

                                  </blockquote>

                                  I hope my explanation above helps.  If

                                  not, ask, and I'll try to explain more

                                  clearly. </div>

                              </div>

                            </span></blockquote>

                        </div>

                        <p>I think I understand it.  I think that the

                          only issue is that:</p>

                        <p>- Patchpoints currently don't do what you

                          want.</p>

                        <p>- If you made patchpoints do what you want

                          then you'd break WebKit - not to mention

                          anyone who wants to use them for inline

                          caches.</p>

                        <p>So it seems like you want a new intrinsic.

                           You should officially propose this new

                          intrinsic, particularly since the core

                          semantic differences are not so great from

                          what we have now.  OTOH, if you truly believe

                          that patchpoints should just be changed to

                          your semantics in a way that does break

                          WebKit, then that's probably also something

                          you should get off your chest. ;-)</p>

                        <div>

                          <div>

                            <blockquote type="cite" class="clean_bq"

                              style="color: rgb(0, 0, 0); font-family:

                              Helvetica, Arial; font-size: 13px;

                              font-style: normal; font-variant: normal;

                              font-weight: normal; letter-spacing:

                              normal; line-height: normal; orphans:

                              auto; text-align: start; text-indent: 0px;

                              text-transform: none; white-space: normal;

                              widows: auto; word-spacing: 0px;

                              -webkit-text-stroke-width: 0px;

                              background-color: rgb(255, 255, 255);"><span>

                                <div text="#000000" bgcolor="#FFFFFF">

                                  <div><br>

                                    <br>

                                    One point just for clarity; I don't

                                    believe this effects the conclusions

                                    of our discussion so far.  I'm also

                                    fairly sure that you (Filip) are

                                    aware of this, but want to spell it

                                    out for other readers. <br>

                                    <br>

                                    You seem to be assuming that

                                    compiled code needs to explicitly

                                    branch to a point where deopt state

                                    is known to exit a compiled frame. <span

                                      class="Apple-converted-space"> </span></div>

                                </div>

                              </span></blockquote>

                          </div>

                          <p>This is a slightly unclear characterization

                            of my assumptions.  Our JIT does

                            deoptimization without explicit branches for

                            many, many things.  You should look at it

                            some time, it's pretty fancy. ;-)</p>

                          <div>

                            <div>

                              <blockquote type="cite" class="clean_bq"

                                style="color: rgb(0, 0, 0); font-family:

                                Helvetica, Arial; font-size: 13px;

                                font-style: normal; font-variant:

                                normal; font-weight: normal;

                                letter-spacing: normal; line-height:

                                normal; orphans: auto; text-align:

                                start; text-indent: 0px; text-transform:

                                none; white-space: normal; widows: auto;

                                word-spacing: 0px;

                                -webkit-text-stroke-width: 0px;

                                background-color: rgb(255, 255, 255);"><span>

                                  <div text="#000000" bgcolor="#FFFFFF">

                                    <div>Worth noting is that you can

                                      also exit a compiled frame on a

                                      trap (without an explicitly

                                      condition/branch!) if the deopt

                                      state is known at the point you

                                      take the trap.  This "exit frame

                                      on trap" behavior shows up with

                                      null pointer exceptions as well. 

                                      I'll note that both compilers in

                                      OpenJDK support some combination

                                      of "exit-on-trap" conditions for

                                      division and null dereferences. 

                                      The two differ on exactly what

                                      strategies they use in each case

                                      though.  :)</div>

                                  </div>

                                </span></blockquote>

                            </div>

                            <p>Yeah, and I've also implemented VMs that

                              do this - and I endorse the basic idea.  I

                              know what you want, and my only point is

                              that the existing patchpoints only give

                              you this if you're willing to make a huge

                              compromise: namely, that you're willing to

                              make the division (or heap load for the

                              null case) completely opaque to the

                              compiler to the point that GVN, LICM,

                              SCCP, and all algebraic reasoning have to

                              give up on optimizing it.  The point of

                              using LLVM is that it can optimize code.

                               It can optimize branches and divisions

                              pretty well.  So, eliminating an explicit

                              branch by replacing it with a construct

                              that appears opaque to the optimizer is

                              not a smart trade-off.</p>

                            <p>You could add a new intrinsic that, like

                              patchpoints, records the layout of state

                              in a side-table, but that is used as a

                              kind of wrapper for operations that LLVM

                              understands.  This may or may not be hairy

                              - you seem to have sort of acknowledged

                              that it's got some complexity and I've

                              also pointed out some possible issues.  If

                              this is something that you want, you

                              should propose it so that others know what

                              you're talking about.  One danger of how

                              we're discussing this right now is that

                              you're overloading patchpoints to mean the

                              thing you want them to mean rather than

                              what they actually mean, which makes it

                              seem like we don't need Michael's

                              intrinsics on the grounds that patchpoints

                              already offer a solution.  They don't

                              already offer a solution precisely because

                              patchpoints don't do what your intrinsics

                              would do.</p>

                            <div>

                              <div>

                                <blockquote type="cite" class="clean_bq"

                                  style="color: rgb(0, 0, 0);

                                  font-family: Helvetica, Arial;

                                  font-size: 13px; font-style: normal;

                                  font-variant: normal; font-weight:

                                  normal; letter-spacing: normal;

                                  line-height: normal; orphans: auto;

                                  text-align: start; text-indent: 0px;

                                  text-transform: none; white-space:

                                  normal; widows: auto; word-spacing:

                                  0px; -webkit-text-stroke-width: 0px;

                                  background-color: rgb(255, 255, 255);"><span>

                                    <div text="#000000"

                                      bgcolor="#FFFFFF">

                                      <div><br>

                                        <br>

                                        I'm not really arguing that

                                        either scheme is "better" in all

                                        cases.  I'm simply arguing that

                                        we should support both and allow

                                        optimization and tuning between

                                        them.  As far as I can tell, you

                                        seem to be assuming that an

                                        explicit branch to known exit

                                        point is always superior.<br>

                                        <br>

                                        <br>

                                        Ok, back to the topic at hand...<br>

                                        <br>

                                        With regards to the current

                                        proposal, I'm going to take a

                                        step back.  You guys seem to

                                        have already looked in this in a

                                        fair amount of depth.  I'm not

                                        necessarily convinced you've

                                        come to the best solution, but

                                        at some point, we need to make

                                        forward progress.  What you have

                                        is clearly better than nothing. <br>

                                        <br>

                                        Please go ahead and submit your

                                        current approach.  We can come

                                        back and revise later if we

                                        really need to. <br>

                                        <br>

                                        I do request the following

                                        changes:<br>

                                        - Mark it clearly as

                                        experimental.</div>

                                    </div>

                                  </span></blockquote>

                              </div>

                              <div>

                                <div>

                                  <blockquote type="cite"

                                    class="clean_bq" style="color:

                                    rgb(0, 0, 0); font-family:

                                    Helvetica, Arial; font-size: 13px;

                                    font-style: normal; font-variant:

                                    normal; font-weight: normal;

                                    letter-spacing: normal; line-height:

                                    normal; orphans: auto; text-align:

                                    start; text-indent: 0px;

                                    text-transform: none; white-space:

                                    normal; widows: auto; word-spacing:

                                    0px; -webkit-text-stroke-width: 0px;

                                    background-color: rgb(255, 255,

                                    255);"><span>

                                      <div text="#000000"

                                        bgcolor="#FFFFFF">

                                        <div><br>

                                          - Either don't specify the

                                          value computed in the edge

                                          cases, or allow those values

                                          to be specified as constant

                                          arguments to the call.  This

                                          allows efficient lowering to

                                          x86's div instruction if you

                                          want to make use of the

                                          trapping semantics. </div>

                                      </div>

                                    </span></blockquote>

                                </div>

                                <p>Once again: how would you use this to

                                  get trapping semantics without

                                  throwing all of LLVM's optimizations

                                  out the window, in the absence of the

                                  kind of patchpoint-like intrinsic that

                                  you want?  I ask just to make sure

                                  that we're on the same page.</p>

                                <div>

                                  <blockquote type="cite"

                                    class="clean_bq" style="color:

                                    rgb(0, 0, 0); font-family:

                                    Helvetica, Arial; font-size: 13px;

                                    font-style: normal; font-variant:

                                    normal; font-weight: normal;

                                    letter-spacing: normal; line-height:

                                    normal; orphans: auto; text-align:

                                    start; text-indent: 0px;

                                    text-transform: none; white-space:

                                    normal; widows: auto; word-spacing:

                                    0px; -webkit-text-stroke-width: 0px;

                                    background-color: rgb(255, 255,

                                    255);"><span>

                                      <div text="#000000"

                                        bgcolor="#FFFFFF">

                                        <div><br>

                                          <br>

                                          <blockquote

                                            cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"

                                            type="cite">

                                            <div>

                                              <div>

                                                <p>Finally, as for

                                                  performance data,

                                                  which part of this do

                                                  you want performance

                                                  data for?  I concede

                                                  that I don't have

                                                  performance data for

                                                  using Michael's new

                                                  intrinsic.  Part of

                                                  what the intrinsic

                                                  accomplishes is it

                                                  gives a less ugly way

                                                  of doing something

                                                  that is already

                                                  possible with target

                                                  intrinsics on ARM.  I

                                                  think it would be

                                                  great if you could get

                                                  those semantics -

                                                  along with a

                                                  known-good

                                                  implementation - on

                                                  other architectures as

                                                  well.</p>

                                              </div>

                                            </div>

                                          </blockquote>

                                          I would be very interested in

                                          seeing data comparing two

                                          schemes:<br>

                                          - Explicit control flow emited

                                          by the frontend<br>

                                          - The safe.div intrinsic

                                          emitted by the frontend,

                                          desugared in CodeGenPrep<br>

                                          <br>

                                          My strong suspicion is that

                                          each would preform well in

                                          some cases and not in others. 

                                          At least on x86.  Since the

                                          edge-checks are essentially

                                          free on ARM, the second scheme

                                          would probably be strictly

                                          superior there. <br>

                                          <br>

                                          I am NOT asking that we block

                                          submission on this data

                                          however. <br>

                                          <br>

                                          <blockquote

                                            cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"

                                            type="cite">

                                            <div>

                                              <div>

                                                <p>But this discussion

                                                  has also involved

                                                  suggestions that we

                                                  should use trapping to

                                                  implement

                                                  deoptimization, and

                                                  the main point of my

                                                  message is to strongly

                                                  argue against anything

                                                  like this given the

                                                  current state of

                                                  trapping semantics and

                                                  how patchpoints work.

                                                   My point is that

                                                  using traps for

                                                  division corner cases

                                                  would either be

                                                  unsound (see the

                                                  stackmap after the

                                                  trap, above), or would

                                                  require you to do

                                                  things that are

                                                  obviously inefficient.

                                                   If you truly believe

                                                  that the branch to

                                                  detect division slow

                                                  paths is more

                                                  expensive than

                                                  spilling all bytecode

                                                  state to the stack or

                                                  using a patchpoint for

                                                  the division, then I

                                                  could probably hack

                                                  something up in WebKit

                                                  to show you the

                                                  performance

                                                  implications.  (Or you

                                                  could do it yourself,

                                                  the code is open

                                                  source...)</p>

                                              </div>

                                            </div>

                                          </blockquote>

                                          In a couple of months, I'll

                                          probably have the performance

                                          data to discuss this for

                                          real.  When that happens,

                                          let's pick this up and

                                          continue the debate. 

                                          Alternatively, if you want to

                                          chat this over more with a

                                          beer in hand at the social

                                          next week, let me know.  In

                                          the meantime, let's not stall

                                          the current proposal any

                                          more. <br>

                                          <br>

                                          Philip<br>

                                          <br>

                                        </div>

                                      </div>

                                    </span></blockquote>

                                </div>

                              </div>

                            </div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

  </body>

</html>