<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Andy - If you're not already following
      this closely, please start.  We've gotten into fairly fundamental
      questions of what a patchpoint does.  <br>
      <br>
      Filip, <br>
      <br>
      I think you've hit the nail on the head.  What I'm thinking of as
      being patchpoints are not what you think they are.  Part of that
      is that I've got a local change which adds a very similar
      construction (called "statepoints" for the moment), but I was
      trying to keep that separate.  That also includes a lot of GC
      semantics which are not under discussion currently.  My apologies
      if that experience bled over into this conversation and made
      things more confusing.  <br>
      <br>
      I will note that the documentation for patchpoint say explicitly
      the following:<br>
      "The ‘<tt class="docutils literal"><span class="pre">llvm.experimental.patchpoint.*</span></tt>‘
      intrinsics creates a function
      call to the specified <tt class="docutils literal"><span
          class="pre"><target></span></tt> and records the
      location of specified
      values in the stack map."<br>
      <br>
      My reading has always been that a patchpoint *that isn't patched*
      is simply a call with a stackmap associated with it.  To my
      reading, this can (and did, and does) indicate my proposed usage
      would be legal.  <br>
      <br>
      I will agree that I've confused the topic badly on the
      optimization front.  My "statepoint" isn't patchable, so a lot
      more optimizations are legal.  Sorry about that.  To restate what
      I think you've been saying all along, the optimizer can't make
      assumptions about what function is called by a patchpoint because
      that might change based on later patching.  Is this the key point
      you've been trying to make?<br>
      <br>
      I'm not objecting to separating "my patchpoint" from "your
      patchpoint".  Let's just hammer out the semantics of each first. 
      :)<br>
      <br>
      Again, longer response to follow in a day or so. :)<br>
      <br>
      Philip<br>
      <br>
      On 04/30/2014 10:09 PM, Filip Pizlo wrote:<br>
    </div>
    <blockquote cite="mid:etPan.5361d71d.1d4ed43b.172db@dethklok.local"
      type="cite">
      <style>body{font-family:Helvetica,Arial;font-size:13px}</style>
      <div id="bloop_customfont"
        style="font-family:Helvetica,Arial;font-size:13px; color:
        rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br>
      </div>
      <br>
      <p style="color:#000;">On April 30, 2014 at 9:06:20 PM, Philip
        Reames (<a moz-do-not-send="true"
          href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)
        wrote:</p>
      <div>
        <div>
          <blockquote type="cite" class="clean_bq" style="color: rgb(0,
            0, 0); font-family: Helvetica, Arial; font-size: 13px;
            font-style: normal; font-variant: normal; font-weight:
            normal; letter-spacing: normal; line-height: normal;
            orphans: auto; text-align: start; text-indent: 0px;
            text-transform: none; white-space: normal; widows: auto;
            word-spacing: 0px; -webkit-text-stroke-width: 0px;
            background-color: rgb(255, 255, 255);"><span>
              <div text="#000000" bgcolor="#FFFFFF">
                <div>
                  <div class="moz-cite-prefix">On 04/29/2014 12:39 PM,
                    Filip Pizlo wrote:<br>
                  </div>
                  <blockquote
                    cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"
                    type="cite">On April 29, 2014 at 11:27:06 AM, Philip
                    Reames (<a moz-do-not-send="true"
                      href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)
                    wrote:
                    <div>
                      <blockquote type="cite" class="clean_bq"
                        style="color: rgb(0, 0, 0); font-family:
                        Helvetica, Arial; font-size: 13px; font-style:
                        normal; font-variant: normal; font-weight:
                        normal; letter-spacing: normal; line-height:
                        normal; orphans: auto; text-align: start;
                        text-indent: 0px; text-transform: none;
                        white-space: normal; widows: auto; word-spacing:
                        0px; -webkit-text-stroke-width: 0px;
                        background-color: rgb(255, 255, 255);">
                        <div bgcolor="#FFFFFF" text="#000000">
                          <div>
                            <div class="moz-cite-prefix"><span>On
                                04/29/2014 10:44 AM, Filip Pizlo wrote:<br>
                              </span></div>
                            <blockquote
                              cite="mid:etPan.535fe4f0.140e0f76.172db@dethklok.local"
                              type="cite">
                              <div id="bloop_customfont"
                                style="font-family: Helvetica, Arial;
                                font-size: 13px; color: rgb(0, 0, 0);
                                margin: 0px;"><span>LD;DR: Your desire
                                  to use trapping on x86 only further
                                  convinces me that Michael's proposed
                                  intrinsics are the best way to go.</span></div>
                            </blockquote>
                            <span>I'm still not convinced, but am not
                              going to actively oppose it either.  I'm
                              leery of designing a solution with major
                              assumptions we don't have data to backup. <br>
                              <br>
                              I worry your assumptions about
                              deoptimization are potentially unsound. 
                              But I don't have data to actually show
                              this (yet).</span></div>
                        </div>
                      </blockquote>
                    </div>
                    <p>I *think* I may have been unclear about my
                      assumptions; in particular, my claims with respect
                      to deoptimization are probably more subtle than
                      they appeared.  WebKit can use LLVM and it has
                      divisions and we do all possible
                      deoptimization/profiling/etc tricks for it, so
                      this is grounded in experience.  Forgive me if the
                      rest of this e-mail contains a lecture on things
                      that are obvious - I'll try to err on the side of
                      clarity and completeness since this discussion is
                      sufficiently dense that we run the risk of talking
                      cross-purposes unless some baseline assumptions
                      are established.</p>
                  </blockquote>
                  I think we're using the same terminology, but with
                  slightly different sets of assumptions.  I'll point
                  this out below where relevant. <br>
                  <br>
                  Also, thanks for taking the time to expand.  It help
                  clarify the discussion quite a bit. </div>
              </div>
            </span></blockquote>
        </div>
        <p>I think we may be converging to an understanding of what you
          want versus what I want, and I think that there are some
          points - possibly unrelated to division - that are worth
          clarifying.  I think that the main difference is that when I
          say "patchpoint", I am referring to a concrete intrinsic with
          specific semantics that cannot change without breaking WebKit,
          while you are using the term to refer to a broad concept, or
          rather, a class of as-yet-unimplemented intrinsics that share
          some of the same features with patchpoints but otherwise have
          incompatible semantics.</p>
        <p>Also, when I say that you wouldn't want to use the existing
          patchpoint to do your trapping deopt, what I mean is that the
          performance of doing this would suck for reasons that are not
          related to deoptimization or trapping.  I'm not claiming that
          deoptimization performs poorly (trust me, I know better) or
          that trapping to deoptimize is bad (I've done this many, many
          times and I know better).  I'm saying that with the current
          patchpoint intrinsics in LLVM, as they are currently specified
          and implemented, doing it would be a bad idea because you'd
          have to compromise a bunch of other optimizations to achieve
          it.</p>
        <p>You have essentially described new intrinsics that would make
          this less of a bad idea and I am interested in your
          intrinsics, so I'll try to both respond with why patchpoints
          don't currently give you what you want (and why simply
          changing patchpoint semantics would be evil) and I'll also try
          to comment on what I think of the intrinsic that you're
          effectively proposing.  Long story short, I think you should
          formally propose your intrinsic even if it's not completely
          fleshed out.  I think that it's an interesting capability and
          in its most basic form, it is a simple variation of the
          current patchpoint/stackmap intrinsics.</p>
        <div>
          <blockquote type="cite" class="clean_bq" style="color: rgb(0,
            0, 0); font-family: Helvetica, Arial; font-size: 13px;
            font-style: normal; font-variant: normal; font-weight:
            normal; letter-spacing: normal; line-height: normal;
            orphans: auto; text-align: start; text-indent: 0px;
            text-transform: none; white-space: normal; widows: auto;
            word-spacing: 0px; -webkit-text-stroke-width: 0px;
            background-color: rgb(255, 255, 255);"><span>
              <div text="#000000" bgcolor="#FFFFFF">
                <div>
                  <blockquote
                    cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"
                    type="cite">
                    <div>
                      <div>
                        <blockquote type="cite" class="clean_bq"
                          style="color: rgb(0, 0, 0); font-family:
                          Helvetica, Arial; font-size: 13px; font-style:
                          normal; font-variant: normal; font-weight:
                          normal; letter-spacing: normal; line-height:
                          normal; orphans: auto; text-align: start;
                          text-indent: 0px; text-transform: none;
                          white-space: normal; widows: auto;
                          word-spacing: 0px; -webkit-text-stroke-width:
                          0px; background-color: rgb(255, 255, 255);">
                          <div bgcolor="#FFFFFF" text="#000000">
                            <div><span><br
                                  class="Apple-interchange-newline">
                                <br>
                                <br>
                              </span>
                              <blockquote
                                cite="mid:etPan.535fe4f0.140e0f76.172db@dethklok.local"
                                type="cite"><span><br>
                                </span>
                                <p style="color: rgb(0, 0, 0);"><span>On
                                    April 29, 2014 at 10:09:49 AM,
                                    Philip Reames (<a
                                      moz-do-not-send="true"
                                      href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)
                                    wrote:</span></p>
                                <div>
                                  <blockquote type="cite"
                                    class="clean_bq" style="color:
                                    rgb(0, 0, 0); font-family:
                                    Helvetica, Arial; font-size: 13px;
                                    font-style: normal; font-variant:
                                    normal; font-weight: normal;
                                    letter-spacing: normal; line-height:
                                    normal; orphans: auto; text-align:
                                    start; text-indent: 0px;
                                    text-transform: none; white-space:
                                    normal; widows: auto; word-spacing:
                                    0px; -webkit-text-stroke-width: 0px;
                                    background-color: rgb(255, 255,
                                    255);">
                                    <div bgcolor="#FFFFFF"
                                      text="#000000">
                                      <div><span><span>As the discussion
                                            has progressed and I've
                                            spent more time thinking
                                            about the topic, I find
                                            myself less and less
                                            enthused about the current
                                            proposal.  I'm in full
                                            support of having idiomatic
                                            ways to express safe
                                            division, but I'm starting
                                            to doubt that using an
                                            intrinsic is the right way
                                            at the moment.<br>
                                            <br>
                                            One case I find myself
                                            thinking about is how one
                                            would combine profiling
                                            information and implicit
                                            div-by-zero/overflow checks
                                            with this proposal.  I don't
                                            really see a clean way. 
                                            Ideally, for a "safe div"
                                            which never has the
                                            exceptional paths taken,
                                            you'd like to completely do
                                            away with the control flow
                                            entirely.  (And rely on
                                            hardware traps w/exceptions
                                            instead.)  I don't really
                                            see a way to represent that
                                            type of construct given the
                                            current proposal. </span></span></div>
                                    </div>
                                  </blockquote>
                                </div>
                                <p>This is a deeper problem and to solve
                                  it you'd need a solution to trapping
                                  in general.  Let's consider the case
                                  of Java.  A Java program may want to
                                  catch the arithmetic exception due to
                                  divide by zero.  How would you do this
                                  with a trap in LLVM IR?  Spill all
                                  state that is live at the catch?  Use
                                  a patchpoint for the entire division
                                  instruction?</p>
                              </blockquote>
                              We'd likely use something similar to a
                              patchpoint.  You'd need the "abstract vm
                              state" (which is not the compiled frame
                              necessarily) available at the div
                              instruction.  You could then re-enter the
                              interpreter at the specified index (part
                              of the vm state).  We have all most of
                              these mechanisms in place.  Ideally, you'd
                              trigger a recompile and otherwise ensure
                              re-entry into compiled code at the soonest
                              possible moment. <br>
                              <br>
                              This requires a lot of runtime support,
                              but we already have most of it implemented
                              for another compiler.  From our
                              perspective, the runtime requirements are
                              not a major blocker. </div>
                          </div>
                        </blockquote>
                      </div>
                      <p>Right, you'll use a patchpoint.  That's way
                        more expensive than using a safe division
                        intrinsic with branches, because it's opaque to
                        the optimizer.</p>
                    </div>
                  </blockquote>
                  This statement is true at the moment, but it shouldn't
                  be.  I think this is our fundamental difference in
                  approach. <br>
                  <br>
                  You should be able to write something like:<br>
                  i32 %res = invoke patchpoint (... x86_trapping_divide,
                  a, b) normal_dest invoke_dest<br>
                  <br>
                  normal_dest:<br>
                    ;; use %res<br>
                  invoke_dest:<br>
                    landingpad<br>
                    ;; dispatch edge cases<br>
                    ;; this could be unreachable code if you deopt this
                  frame in the trap handler and jump directly to an
                  interpreter or other bit of code</div>
              </div>
            </span></blockquote>
        </div>
      </div>
      <p>I see.  It sounds like you want a generalization of the
        "div.with.stackmap" that I thought you wanted - you want to be
        able to wrap anything in a stackmap.</p>
      <p>The current patchpoint intrinsic does not do this, and you run
        the risk of breaking existing semantics if you changed this.
         You'd probably break WebKit, which treats the call target of
        the patchpoint as nothing more than a quirk - we always pass
        null.  Also, the current patchpoint treats the callee as an i8*
        if I remember right and it would be super weird if all LLVM
        phases had to interpret this i8* by unwrapping a possible
        bitcast to get to a declared function that may be an intrinsic.
         Yuck!  Basically, the call target of existing patchpoints is
        meant to be a kind of convenience feature rather than the core
        of the mechanism.</p>
      <p>I agree in principle that the intrinsic that you want would be
        a useful intrinsic.  But let's not call it a patchpoint for the
        purposes of this discussion, and let's not confuse the
        discussion by claiming (incorrectly) that the existing
        patchpoint facility gives you what you want.  It doesn't:
        patchpoints are designed to make the call target opaque (hence
        the use of i8*) and there shouldn't be a correlation between
        what the patchpoint does at run-time and what the called
        function would have done.  You could make the call target be
        null (like WebKit does) and the patchpoint should still mean
        "this code can do anything" because the expectation is that the
        client JIT will patch over it anyway.</p>
      <p>Also, "patchpoint" would probably not be the right term for the
        intrinsic that you want.  I think that what you want is
        "call.with.stackmap".  Or maybe "stackmap.wrapper".  Or just
        "stackmap" - I'd be OK, in principle, with changing the name of
        the current "stackmap" intrinsic to something that reflects the
        fact that it's less of a stackmap than what you want.</p>
      <p>To summarize.  A patchpoint's main purpose is that you can
        patch it with arbitrary code.  The current "stackmap" means that
        you can patch it with arbitrary code and that patching may be
        destructive to a shadow of machine code bytes, so it's really
        just like patchpoints - we could change its name to
        "patchpoint.shadow" for example.</p>
      <p>If you were to propose such a stackmap intrinsic, then I think
        there could be some ways of doing it that wouldn't be too
        terrible.  Basically you want something that is like a
        patchpoint in that it reports a stackmap via a side channel, but
        unlike patchpoints, it doesn't allow arbitrary patching -
        instead the optimizer should be allowed to assume that the code
        within the patchpoint will always do the same thing that the
        call target would have done.  There are downsides to truly doing
        this.  For example, to make division efficient with such an
        intrinsic, you'd have to do something that is somewhat worse
        than just recognizing intrinsics in the optimizer - you'd have
        to first recognize a call to your "stackmap wrapper" intrinsic
        and then observe that its call target argument is in turn
        another intrinsic.  To me personally, that's kind of yucky, but
        I won't deny that it could be useful.</p>
      <p>As to the use of invoke: I don't believe that the use of invoke
        versus my suggested "branch on a trap predicate" idea are
        different in any truly meaningful way.  I buy that either would
        work.</p>
      <div>
        <div>
          <blockquote type="cite" class="clean_bq" style="color: rgb(0,
            0, 0); font-family: Helvetica, Arial; font-size: 13px;
            font-style: normal; font-variant: normal; font-weight:
            normal; letter-spacing: normal; line-height: normal;
            orphans: auto; text-align: start; text-indent: 0px;
            text-transform: none; white-space: normal; widows: auto;
            word-spacing: 0px; -webkit-text-stroke-width: 0px;
            background-color: rgb(255, 255, 255);"><span>
              <div text="#000000" bgcolor="#FFFFFF">
                <div><br>
                  <br>
                  A patchpoint should not require any excess spilling. 
                  If values are live in registers, that should be
                  reflected in the stack map.  (I do not know if this is
                  the case for patchpoint at the moment or not.)</div>
              </div>
            </span></blockquote>
        </div>
        <p>Patchpoints do not require spilling.</p>
        <p>My point was that with existing patchpoints, you *either* use
          a patchpoint for the entire division which makes the division
          opaque to the optimizer - because a patchpoint means "this
          code can do anything" - *or* you could spill stuff to the
          stack prior to your trapping division intrinsic, since
          spilling is something that you could do as a workaround if you
          didn't have a patchpoint.</p>
        <p>The reason why I brought up spilling at all is that I suspect
          that spilling all state to the stack might be cheaper - for
          some systems - than turning the division into a patchpoint.
           Turning the division into a patchpoint is horrendously brutal
          - the patchpoint looks like it clobbers the heap (which a
          division doesn't do), has to execute (a division is an obvious
          DCE candidate), cannot be hoisted (hoisting divisions is
          awesome), etc.  Perhaps most importantly, though, a patchpoint
          doesn't tell LLVM that you're *doing a division* - so all
          constant folding, strenght reduction, and algebraic reasoning
          flies out the window.  On the other hand, spilling all state
          to the stack is an arguably sound and performant solution to a
          lot of VM problems.  I've seen JVM implementations that ensure
          that there is always a copy of state on the stack at some
          critical points, just because it makes loads of stuff simpler
          (debugging, profiling, GC, and of course deopt).  I personally
          prefer to stay away from such a strategy because it's not
          free.</p>
        <p>On the other hand, branches can be cheap.  A branch on a
          divide is cheaper than not being able to optimize the divide.</p>
        <div>
          <div>
            <blockquote type="cite" class="clean_bq" style="color:
              rgb(0, 0, 0); font-family: Helvetica, Arial; font-size:
              13px; font-style: normal; font-variant: normal;
              font-weight: normal; letter-spacing: normal; line-height:
              normal; orphans: auto; text-align: start; text-indent:
              0px; text-transform: none; white-space: normal; widows:
              auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;
              background-color: rgb(255, 255, 255);"><span>
                <div text="#000000" bgcolor="#FFFFFF">
                  <div><br>
                    <br>
                    The Value called by a patchpoint should participate
                    in optimization normally. <span
                      class="Apple-converted-space"> </span></div>
                </div>
              </span></blockquote>
          </div>
          <p>I agree that you could have a different intrinsic that
            behaves like this.</p>
          <div>
            <div>
              <blockquote type="cite" class="clean_bq" style="color:
                rgb(0, 0, 0); font-family: Helvetica, Arial; font-size:
                13px; font-style: normal; font-variant: normal;
                font-weight: normal; letter-spacing: normal;
                line-height: normal; orphans: auto; text-align: start;
                text-indent: 0px; text-transform: none; white-space:
                normal; widows: auto; word-spacing: 0px;
                -webkit-text-stroke-width: 0px; background-color:
                rgb(255, 255, 255);"><span>
                  <div text="#000000" bgcolor="#FFFFFF">
                    <div>We really want the patchpoint part of the call
                      to be supplemental.  It should still be a call. 
                      It should be constant propagated, transformed,
                      etc..  This is not the case currently.  I've got a
                      couple of off the wall ideas for improving the
                      current status, but I'll agree this is a hardish
                      problem. <br>
                      <br>
                      It should be legal to use a patchpoint in an
                      invoke.  It's an ABI issue of how the invoke path
                      gets invoked.  (i.e. side tables for the runtime
                      to lookup, etc..)  This is not possible today, and
                      probably requires a fair amount of work.  Some of
                      it, I've already done and will be sharing
                      shortly.  Other parts, I haven't even thought
                      about. </div>
                  </div>
                </span></blockquote>
            </div>
            <p>Right, it's significantly more complex than either the
              existing patchpoints or Michael's proposed safe.div.</p>
            <div>
              <div>
                <blockquote type="cite" class="clean_bq" style="color:
                  rgb(0, 0, 0); font-family: Helvetica, Arial;
                  font-size: 13px; font-style: normal; font-variant:
                  normal; font-weight: normal; letter-spacing: normal;
                  line-height: normal; orphans: auto; text-align: start;
                  text-indent: 0px; text-transform: none; white-space:
                  normal; widows: auto; word-spacing: 0px;
                  -webkit-text-stroke-width: 0px; background-color:
                  rgb(255, 255, 255);"><span>
                    <div text="#000000" bgcolor="#FFFFFF">
                      <div><br>
                        <br>
                        If you didn't want to use the trapping
                        semantics, you'd insert dedicated control flow
                        _before_ the divide.  This would allow normal
                        optimization of the control flow. <br>
                        <br>
                        Notes:<br>
                        1) This might require a new PATCHPOINT pseudo op
                        in the backend.  Haven't thought much about that
                        yet.<br>
                        2) I *think* your current intrinsic could be
                        translated into something like this.  (Leaving
                        aside the question of where the deopt state
                        comes from.)  In fact, the more I look at this,
                        the less difference I actually see between the
                        approaches. <br>
                        <br>
                        <br>
                        <blockquote
                          cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"
                          type="cite">
                          <div>
                            <div>
                              <div>
                                <blockquote type="cite" class="clean_bq"
                                  style="color: rgb(0, 0, 0);
                                  font-family: Helvetica, Arial;
                                  font-size: 13px; font-style: normal;
                                  font-variant: normal; font-weight:
                                  normal; letter-spacing: normal;
                                  line-height: normal; orphans: auto;
                                  text-align: start; text-indent: 0px;
                                  text-transform: none; white-space:
                                  normal; widows: auto; word-spacing:
                                  0px; -webkit-text-stroke-width: 0px;
                                  background-color: rgb(255, 255, 255);">
                                  <div bgcolor="#FFFFFF" text="#000000">
                                    <div>
                                      <blockquote
                                        cite="mid:etPan.535fe4f0.140e0f76.172db@dethklok.local"
                                        type="cite">
                                        <p><span><br
                                              class="Apple-interchange-newline">
                                            In a lot of languages, a
                                            divide produces some result
                                            even in the exceptional case
                                            and this result requires
                                            effectively deoptimizing
                                            since the resut won't be the
                                            one you would have predicted
                                            (double instead of int, or
                                            BigInt instead of small
                                            int), which sort of means
                                            that if the CPU exception
                                            occurs you have to be able
                                            to reconstruct all state.  A
                                            patchpoint could do this,
                                            and so could spilling all
                                            state to the stack before
                                            the divide - but both are
                                            very heavy hammers that are
                                            sure to be more expensive
                                            than just doing a branch.</span></p>
                                      </blockquote>
                                      <span>This isn't necessarily as
                                        expensive as you might believe. 
                                        I'd recommend reading the Graal
                                        project papers on this topic.<br>
                                        <br>
                                        Whether deopt or branching is
                                        more profitable *in this case*,
                                        I can't easily say.  I'm not yet
                                        to the point of being able to
                                        run that experiment.  We can
                                        argue about what "should" be
                                        better all we want, but real
                                        performance data is the only way
                                        to truly know. </span></div>
                                  </div>
                                </blockquote>
                              </div>
                              <p>My point may have been confusing.  I
                                know that deoptimization is cheap and
                                WebKit uses it everywhere, including
                                division corner cases, if profiling
                                tells us that it's profitable to do so
                                (which it does, in the common case).
                                 WebKit is a heavy user of
                                deoptimization in general, so you don't
                                need to convince me that it's worth it.</p>
                            </div>
                          </div>
                        </blockquote>
                        Acknowledged. <br>
                        <blockquote
                          cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"
                          type="cite">
                          <div>
                            <div>
                              <p>Note that I want *both* deopt *and*
                                branching, because in this case, a
                                branch is the fastest overall way of
                                detecting when to deopt.  In the future,
                                I will want to implement the deopt in
                                terms of branching, and when we do this,
                                I believe that the most sound and
                                performat approach would be using
                                Michael's intrinsics.  This is subtle
                                and I'll try to explain why it's the
                                case.</p>
                              <p>The point is that you wouldn't want to
                                do deoptimization by spilling state on
                                the main path or by using a patchpoint
                                for the main path of the division.</p>
                            </div>
                          </div>
                        </blockquote>
                        This is the main point I disagree with.  I don't
                        believe that having a patchpoint on the main
                        path should be any more expensive then the
                        original call.  (see above)</div>
                    </div>
                  </span></blockquote>
              </div>
              <p>The reason why the patchpoint is expensive is that if
                you use a patchpoint to implement a division then the
                optimizer won't be allowed to assume that it's a
                division, because the whole point of "patchpoint" is to
                tell the optimizer to piss off.</p>
              <div>
                <div>
                  <blockquote type="cite" class="clean_bq" style="color:
                    rgb(0, 0, 0); font-family: Helvetica, Arial;
                    font-size: 13px; font-style: normal; font-variant:
                    normal; font-weight: normal; letter-spacing: normal;
                    line-height: normal; orphans: auto; text-align:
                    start; text-indent: 0px; text-transform: none;
                    white-space: normal; widows: auto; word-spacing:
                    0px; -webkit-text-stroke-width: 0px;
                    background-color: rgb(255, 255, 255);"><span>
                      <div text="#000000" bgcolor="#FFFFFF">
                        <div><br>
                          <br>
                          Worth noting explicitly: I'm assuming that all
                          of your deopt state would already be available
                          for other purposes in nearby code.  It's on
                          the stack or in registers.  I'm assuming that
                          by adding the deopt point, you are not
                          radically changing the set of computations
                          which need to be done.  If that's not the
                          case, you should avoid deopt and instead just
                          inline the slow paths with explicit checks. </div>
                      </div>
                    </span></blockquote>
                </div>
                <p>Yes, of course it is.  That's not the issue.</p>
                <div>
                  <div>
                    <blockquote type="cite" class="clean_bq"
                      style="color: rgb(0, 0, 0); font-family:
                      Helvetica, Arial; font-size: 13px; font-style:
                      normal; font-variant: normal; font-weight: normal;
                      letter-spacing: normal; line-height: normal;
                      orphans: auto; text-align: start; text-indent:
                      0px; text-transform: none; white-space: normal;
                      widows: auto; word-spacing: 0px;
                      -webkit-text-stroke-width: 0px; background-color:
                      rgb(255, 255, 255);"><span>
                        <div text="#000000" bgcolor="#FFFFFF">
                          <div><br>
                            <br>
                            I'll note that given your assumptions about
                            the cost of a patchpoint, the rest of your
                            position makes a lot more sense.  :)  As I
                            spelled out above, I believe this cost is
                            not fundamental. <br>
                            <blockquote
                              cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"
                              type="cite">
                              <div>
                                <div>
                                  <p>You don't want the common path of
                                    executing the division to involve a
                                    patchpoint instruction, although
                                    using a patchpoint or stackmap to
                                    implement deoptimization on the
                                    failing path is great:</p>
                                  <p><b>Good:</b><span
                                      class="Apple-converted-space"> </span>if
                                    (division would fail) { call
                                    @patchpoint(all of my state) } else
                                    { result = a / b }</p>
                                </div>
                              </div>
                            </blockquote>
                            Given your cost assumptions, I'd agree. </div>
                        </div>
                      </span></blockquote>
                  </div>
                  <p>Not my cost assumptions.  The reason why this is
                    better is that the division is expressed in LLVM IR
                    so that LLVM can do useful things to it - like
                    eliminate it, for example.</p>
                  <div>
                    <div>
                      <blockquote type="cite" class="clean_bq"
                        style="color: rgb(0, 0, 0); font-family:
                        Helvetica, Arial; font-size: 13px; font-style:
                        normal; font-variant: normal; font-weight:
                        normal; letter-spacing: normal; line-height:
                        normal; orphans: auto; text-align: start;
                        text-indent: 0px; text-transform: none;
                        white-space: normal; widows: auto; word-spacing:
                        0px; -webkit-text-stroke-width: 0px;
                        background-color: rgb(255, 255, 255);"><span>
                          <div text="#000000" bgcolor="#FFFFFF">
                            <div>
                              <blockquote
                                cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"
                                type="cite">
                                <div>
                                  <div>
                                    <p><b><br
                                          class="Apple-interchange-newline">
                                        Bad:</b><span
                                        class="Apple-converted-space"> </span>call
                                      @patchpoint(all of my state) //
                                      patch with a divide instruction -
                                      bad because the optimizer has no
                                      clue what you're doing and assumes
                                      the very worst</p>
                                  </div>
                                </div>
                              </blockquote>
                              Yuck.  Agreed. </div>
                          </div>
                        </span></blockquote>
                    </div>
                    <p>To be clear, this is what you're proposing -
                      except that you're assuming that LLVM will know
                      that you've patched a division because you're
                      expecting the call target to have semantic
                      meaning.  Or, rather, you're expecting that you
                      can make the contents of the patchpoint be a
                      division by having the call target be a division
                      intrinsic.  In the current implementation and as
                      it is currently specified, the call target has no
                      meaning and so you get the yuck that I'm showing.</p>
                    <div>
                      <div>
                        <blockquote type="cite" class="clean_bq"
                          style="color: rgb(0, 0, 0); font-family:
                          Helvetica, Arial; font-size: 13px; font-style:
                          normal; font-variant: normal; font-weight:
                          normal; letter-spacing: normal; line-height:
                          normal; orphans: auto; text-align: start;
                          text-indent: 0px; text-transform: none;
                          white-space: normal; widows: auto;
                          word-spacing: 0px; -webkit-text-stroke-width:
                          0px; background-color: rgb(255, 255, 255);"><span>
                            <div text="#000000" bgcolor="#FFFFFF">
                              <div>
                                <blockquote
                                  cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"
                                  type="cite">
                                  <div>
                                    <div>
                                      <p><b><br
                                            class="Apple-interchange-newline">
                                          Worse:</b><span
                                          class="Apple-converted-space"> </span>spill
                                        all state to the stack; call
                                        @trapping.div(a, b) // the
                                        spills will hurt you far more
                                        than a branch, so this should be
                                        avoided</p>
                                    </div>
                                  </div>
                                </blockquote>
                                I'm assuming this is an explicit spill
                                rather than simply recording a stack map
                                *at the div*.  If so, agreed. <br>
                                <blockquote
                                  cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"
                                  type="cite">
                                  <div>
                                    <div>
                                      <p>I suppose we could imagine a
                                        fourth option that involves a
                                        patchpoint to pick up the state
                                        and a trapping divide
                                        instrinsic.  But a trapping
                                        divide intrinsic alone is not
                                        enough.  Consider this:</p>
                                      <p>result = call @trapping.div(a,
                                        b); call @stackmap(all of my
                                        state)</p>
                                      <p>As soon as these are separate
                                        instructions, you have no
                                        guarantees that the state that
                                        the stackmap reports is sound
                                        for the point at which the div
                                        would trap. <br>
                                      </p>
                                    </div>
                                  </div>
                                </blockquote>
                                This is the closest to what I'd propose,
                                except that the two calls would be
                                merged into a single patchpoint.  Isn't
                                the entire point of a patchpoint to
                                record the stack map for a call? <span
                                  class="Apple-converted-space"> </span></div>
                            </div>
                          </span></blockquote>
                      </div>
                      <p>No.  It would be bad if that's what the
                        documentation says.  That's not at all how
                        WebKit uses it or probably any IC client would
                        use it.</p>
                      <p>Patchpoints are designed to be inline assembly
                        on steroids.  They're there to allow the client
                        JIT to tell LLVM to piss off.</p>
                      <div>
                        <div>
                          <blockquote type="cite" class="clean_bq"
                            style="color: rgb(0, 0, 0); font-family:
                            Helvetica, Arial; font-size: 13px;
                            font-style: normal; font-variant: normal;
                            font-weight: normal; letter-spacing: normal;
                            line-height: normal; orphans: auto;
                            text-align: start; text-indent: 0px;
                            text-transform: none; white-space: normal;
                            widows: auto; word-spacing: 0px;
                            -webkit-text-stroke-width: 0px;
                            background-color: rgb(255, 255, 255);"><span>
                              <div text="#000000" bgcolor="#FFFFFF">
                                <div>(Well, ignoring the actual patching
                                  part..)  Why not write this as:<br>
                                  patchpoint(..., trapping.div, a, b);<br>
                                  <br>
                                  Is there something I'm missing here?<br>
                                  <br>
                                  Just to note: I fully agree that the
                                  two call proposal is unsound and
                                  should be avoided. <br>
                                  <br>
                                  <blockquote
                                    cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"
                                    type="cite">
                                    <div>
                                      <div>
                                        <p>So, the division itself
                                          shouldn't be a trapping
                                          instruction and instead you
                                          want to detect the bad case
                                          with a branch.</p>
                                        <p>To be clear:</p>
                                        <p>- Whether you use
                                          deoptimization for division or
                                          anything else - like WebKit
                                          has done since before any of
                                          the Graal papers were written
                                          - is mostly unrelated to how
                                          you represent the division,
                                          unless you wanted to add a new
                                          intrinsic that is like a
                                          trapping-division-with-stackmap:</p>
                                        <p>result = call
                                          @trapping.div.with.stackmap(a,
                                          b, ... all of my state ...)</p>
                                        <p>Now, maybe you do want such
                                          an intrinsic, in which case
                                          you should propose it! <br>
                                        </p>
                                      </div>
                                    </div>
                                  </blockquote>
                                  Given what we already have with
                                  patchpoints, I don't think a merged
                                  intrinsic is necessary.  (See above). 
                                  I believe we have all the parts to
                                  build this solution, and that - in
                                  theory - they should compose neatly.<br>
                                  <br>
                                  p.s. The bits I was referring to was
                                  not deopt per se.  It was particularly
                                  which set of deopt state you used for
                                  each deopt point.  That's a bit of
                                  tangent for the rest of the discussion
                                  now though. <br>
                                  <blockquote
                                    cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"
                                    type="cite">
                                    <div>
                                      <div>
                                        <p>The reason why I haven't
                                          proposed it is that I think
                                          that long-term, the currently
                                          proposed intrinsics are a
                                          better path to getting the
                                          trapping optimizations.  See
                                          my previous mail, where I show
                                          how we could tell LLVM what
                                          the failing path is (which may
                                          have deoptimization code that
                                          uses a stackmap or whatever),
                                          what the trapping predicate is
                                          (it comes from the safe.div
                                          intrinsic), and the fact that
                                          trapping is wise (branch
                                          weights).</p>
                                        <p>- If you want to do the
                                          deoptimization with a trap,
                                          then your only choice
                                          currently is to use a
                                          patchpoint for the main path
                                          of the division.  This will be
                                          slower than using a branch to
                                          an OSR exit basic block,
                                          because you're making the
                                          division itself opaque to the
                                          optimizer (bad) just to get
                                          rid of a branch (which was
                                          probably cheap to begin with).</p>
                                        <p>Hence, what you want to do -
                                          one way or another, regardless
                                          of whether this proposed
                                          intrinsic is added - is to
                                          branch on the corner case
                                          condition, and have the slow
                                          case of the branch go to a
                                          basic block that deoptimizes.
                                           Unless of course you have
                                          profiling that says that the
                                          case does happen often, in
                                          which case you can have that
                                          basic block handle the corner
                                          case inline without leaving
                                          optimized code (FWIW, we do
                                          have such paths in WebKit and
                                          they are useful).</p>
                                        <p>So the question for me is
                                          whether the branching involves
                                          explicit control flow or is
                                          hidden inside an intrinsic.  I
                                          prefer for it to be within an
                                          intrinsic because it:</p>
                                        <p>- allows the optimizer to do
                                          more interesting things in the
                                          common cases, like hoisting
                                          the entire division.</p>
                                        <p>- will give us a clearer path
                                          for implementing trapping
                                          optimizations in the future.</p>
                                        <p>- is an immediate win on ARM.</p>
                                        <p>I'd be curious to hear what
                                          specific idea you have about
                                          how to implement trap-based
                                          deoptimization with your
                                          trapping division intrinsic
                                          for x86 - maybe it's different
                                          from the two "bad" idioms I
                                          showed above.</p>
                                      </div>
                                    </div>
                                  </blockquote>
                                  I hope my explanation above helps.  If
                                  not, ask, and I'll try to explain more
                                  clearly. </div>
                              </div>
                            </span></blockquote>
                        </div>
                        <p>I think I understand it.  I think that the
                          only issue is that:</p>
                        <p>- Patchpoints currently don't do what you
                          want.</p>
                        <p>- If you made patchpoints do what you want
                          then you'd break WebKit - not to mention
                          anyone who wants to use them for inline
                          caches.</p>
                        <p>So it seems like you want a new intrinsic.
                           You should officially propose this new
                          intrinsic, particularly since the core
                          semantic differences are not so great from
                          what we have now.  OTOH, if you truly believe
                          that patchpoints should just be changed to
                          your semantics in a way that does break
                          WebKit, then that's probably also something
                          you should get off your chest. ;-)</p>
                        <div>
                          <div>
                            <blockquote type="cite" class="clean_bq"
                              style="color: rgb(0, 0, 0); font-family:
                              Helvetica, Arial; font-size: 13px;
                              font-style: normal; font-variant: normal;
                              font-weight: normal; letter-spacing:
                              normal; line-height: normal; orphans:
                              auto; text-align: start; text-indent: 0px;
                              text-transform: none; white-space: normal;
                              widows: auto; word-spacing: 0px;
                              -webkit-text-stroke-width: 0px;
                              background-color: rgb(255, 255, 255);"><span>
                                <div text="#000000" bgcolor="#FFFFFF">
                                  <div><br>
                                    <br>
                                    One point just for clarity; I don't
                                    believe this effects the conclusions
                                    of our discussion so far.  I'm also
                                    fairly sure that you (Filip) are
                                    aware of this, but want to spell it
                                    out for other readers. <br>
                                    <br>
                                    You seem to be assuming that
                                    compiled code needs to explicitly
                                    branch to a point where deopt state
                                    is known to exit a compiled frame. <span
                                      class="Apple-converted-space"> </span></div>
                                </div>
                              </span></blockquote>
                          </div>
                          <p>This is a slightly unclear characterization
                            of my assumptions.  Our JIT does
                            deoptimization without explicit branches for
                            many, many things.  You should look at it
                            some time, it's pretty fancy. ;-)</p>
                          <div>
                            <div>
                              <blockquote type="cite" class="clean_bq"
                                style="color: rgb(0, 0, 0); font-family:
                                Helvetica, Arial; font-size: 13px;
                                font-style: normal; font-variant:
                                normal; font-weight: normal;
                                letter-spacing: normal; line-height:
                                normal; orphans: auto; text-align:
                                start; text-indent: 0px; text-transform:
                                none; white-space: normal; widows: auto;
                                word-spacing: 0px;
                                -webkit-text-stroke-width: 0px;
                                background-color: rgb(255, 255, 255);"><span>
                                  <div text="#000000" bgcolor="#FFFFFF">
                                    <div>Worth noting is that you can
                                      also exit a compiled frame on a
                                      trap (without an explicitly
                                      condition/branch!) if the deopt
                                      state is known at the point you
                                      take the trap.  This "exit frame
                                      on trap" behavior shows up with
                                      null pointer exceptions as well. 
                                      I'll note that both compilers in
                                      OpenJDK support some combination
                                      of "exit-on-trap" conditions for
                                      division and null dereferences. 
                                      The two differ on exactly what
                                      strategies they use in each case
                                      though.  :)</div>
                                  </div>
                                </span></blockquote>
                            </div>
                            <p>Yeah, and I've also implemented VMs that
                              do this - and I endorse the basic idea.  I
                              know what you want, and my only point is
                              that the existing patchpoints only give
                              you this if you're willing to make a huge
                              compromise: namely, that you're willing to
                              make the division (or heap load for the
                              null case) completely opaque to the
                              compiler to the point that GVN, LICM,
                              SCCP, and all algebraic reasoning have to
                              give up on optimizing it.  The point of
                              using LLVM is that it can optimize code.
                               It can optimize branches and divisions
                              pretty well.  So, eliminating an explicit
                              branch by replacing it with a construct
                              that appears opaque to the optimizer is
                              not a smart trade-off.</p>
                            <p>You could add a new intrinsic that, like
                              patchpoints, records the layout of state
                              in a side-table, but that is used as a
                              kind of wrapper for operations that LLVM
                              understands.  This may or may not be hairy
                              - you seem to have sort of acknowledged
                              that it's got some complexity and I've
                              also pointed out some possible issues.  If
                              this is something that you want, you
                              should propose it so that others know what
                              you're talking about.  One danger of how
                              we're discussing this right now is that
                              you're overloading patchpoints to mean the
                              thing you want them to mean rather than
                              what they actually mean, which makes it
                              seem like we don't need Michael's
                              intrinsics on the grounds that patchpoints
                              already offer a solution.  They don't
                              already offer a solution precisely because
                              patchpoints don't do what your intrinsics
                              would do.</p>
                            <div>
                              <div>
                                <blockquote type="cite" class="clean_bq"
                                  style="color: rgb(0, 0, 0);
                                  font-family: Helvetica, Arial;
                                  font-size: 13px; font-style: normal;
                                  font-variant: normal; font-weight:
                                  normal; letter-spacing: normal;
                                  line-height: normal; orphans: auto;
                                  text-align: start; text-indent: 0px;
                                  text-transform: none; white-space:
                                  normal; widows: auto; word-spacing:
                                  0px; -webkit-text-stroke-width: 0px;
                                  background-color: rgb(255, 255, 255);"><span>
                                    <div text="#000000"
                                      bgcolor="#FFFFFF">
                                      <div><br>
                                        <br>
                                        I'm not really arguing that
                                        either scheme is "better" in all
                                        cases.  I'm simply arguing that
                                        we should support both and allow
                                        optimization and tuning between
                                        them.  As far as I can tell, you
                                        seem to be assuming that an
                                        explicit branch to known exit
                                        point is always superior.<br>
                                        <br>
                                        <br>
                                        Ok, back to the topic at hand...<br>
                                        <br>
                                        With regards to the current
                                        proposal, I'm going to take a
                                        step back.  You guys seem to
                                        have already looked in this in a
                                        fair amount of depth.  I'm not
                                        necessarily convinced you've
                                        come to the best solution, but
                                        at some point, we need to make
                                        forward progress.  What you have
                                        is clearly better than nothing. <br>
                                        <br>
                                        Please go ahead and submit your
                                        current approach.  We can come
                                        back and revise later if we
                                        really need to. <br>
                                        <br>
                                        I do request the following
                                        changes:<br>
                                        - Mark it clearly as
                                        experimental.</div>
                                    </div>
                                  </span></blockquote>
                              </div>
                              <div>
                                <div>
                                  <blockquote type="cite"
                                    class="clean_bq" style="color:
                                    rgb(0, 0, 0); font-family:
                                    Helvetica, Arial; font-size: 13px;
                                    font-style: normal; font-variant:
                                    normal; font-weight: normal;
                                    letter-spacing: normal; line-height:
                                    normal; orphans: auto; text-align:
                                    start; text-indent: 0px;
                                    text-transform: none; white-space:
                                    normal; widows: auto; word-spacing:
                                    0px; -webkit-text-stroke-width: 0px;
                                    background-color: rgb(255, 255,
                                    255);"><span>
                                      <div text="#000000"
                                        bgcolor="#FFFFFF">
                                        <div><br>
                                          - Either don't specify the
                                          value computed in the edge
                                          cases, or allow those values
                                          to be specified as constant
                                          arguments to the call.  This
                                          allows efficient lowering to
                                          x86's div instruction if you
                                          want to make use of the
                                          trapping semantics. </div>
                                      </div>
                                    </span></blockquote>
                                </div>
                                <p>Once again: how would you use this to
                                  get trapping semantics without
                                  throwing all of LLVM's optimizations
                                  out the window, in the absence of the
                                  kind of patchpoint-like intrinsic that
                                  you want?  I ask just to make sure
                                  that we're on the same page.</p>
                                <div>
                                  <blockquote type="cite"
                                    class="clean_bq" style="color:
                                    rgb(0, 0, 0); font-family:
                                    Helvetica, Arial; font-size: 13px;
                                    font-style: normal; font-variant:
                                    normal; font-weight: normal;
                                    letter-spacing: normal; line-height:
                                    normal; orphans: auto; text-align:
                                    start; text-indent: 0px;
                                    text-transform: none; white-space:
                                    normal; widows: auto; word-spacing:
                                    0px; -webkit-text-stroke-width: 0px;
                                    background-color: rgb(255, 255,
                                    255);"><span>
                                      <div text="#000000"
                                        bgcolor="#FFFFFF">
                                        <div><br>
                                          <br>
                                          <blockquote
                                            cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"
                                            type="cite">
                                            <div>
                                              <div>
                                                <p>Finally, as for
                                                  performance data,
                                                  which part of this do
                                                  you want performance
                                                  data for?  I concede
                                                  that I don't have
                                                  performance data for
                                                  using Michael's new
                                                  intrinsic.  Part of
                                                  what the intrinsic
                                                  accomplishes is it
                                                  gives a less ugly way
                                                  of doing something
                                                  that is already
                                                  possible with target
                                                  intrinsics on ARM.  I
                                                  think it would be
                                                  great if you could get
                                                  those semantics -
                                                  along with a
                                                  known-good
                                                  implementation - on
                                                  other architectures as
                                                  well.</p>
                                              </div>
                                            </div>
                                          </blockquote>
                                          I would be very interested in
                                          seeing data comparing two
                                          schemes:<br>
                                          - Explicit control flow emited
                                          by the frontend<br>
                                          - The safe.div intrinsic
                                          emitted by the frontend,
                                          desugared in CodeGenPrep<br>
                                          <br>
                                          My strong suspicion is that
                                          each would preform well in
                                          some cases and not in others. 
                                          At least on x86.  Since the
                                          edge-checks are essentially
                                          free on ARM, the second scheme
                                          would probably be strictly
                                          superior there. <br>
                                          <br>
                                          I am NOT asking that we block
                                          submission on this data
                                          however. <br>
                                          <br>
                                          <blockquote
                                            cite="mid:etPan.5360000d.4e6afb66.172db@dethklok.local"
                                            type="cite">
                                            <div>
                                              <div>
                                                <p>But this discussion
                                                  has also involved
                                                  suggestions that we
                                                  should use trapping to
                                                  implement
                                                  deoptimization, and
                                                  the main point of my
                                                  message is to strongly
                                                  argue against anything
                                                  like this given the
                                                  current state of
                                                  trapping semantics and
                                                  how patchpoints work.
                                                   My point is that
                                                  using traps for
                                                  division corner cases
                                                  would either be
                                                  unsound (see the
                                                  stackmap after the
                                                  trap, above), or would
                                                  require you to do
                                                  things that are
                                                  obviously inefficient.
                                                   If you truly believe
                                                  that the branch to
                                                  detect division slow
                                                  paths is more
                                                  expensive than
                                                  spilling all bytecode
                                                  state to the stack or
                                                  using a patchpoint for
                                                  the division, then I
                                                  could probably hack
                                                  something up in WebKit
                                                  to show you the
                                                  performance
                                                  implications.  (Or you
                                                  could do it yourself,
                                                  the code is open
                                                  source...)</p>
                                              </div>
                                            </div>
                                          </blockquote>
                                          In a couple of months, I'll
                                          probably have the performance
                                          data to discuss this for
                                          real.  When that happens,
                                          let's pick this up and
                                          continue the debate. 
                                          Alternatively, if you want to
                                          chat this over more with a
                                          beer in hand at the social
                                          next week, let me know.  In
                                          the meantime, let's not stall
                                          the current proposal any
                                          more. <br>
                                          <br>
                                          Philip<br>
                                          <br>
                                        </div>
                                      </div>
                                    </span></blockquote>
                                </div>
                              </div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>