<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <div class="moz-cite-prefix">On 05/02/2014 11:57 AM, Filip Pizlo
      wrote:<br>
    </div>
    <blockquote cite="mid:etPan.5363ea85.57fc4fbb.172db@dethklok.local"
      type="cite">
      <style>body{font-family:Helvetica,Arial;font-size:13px}</style>
      <div id="bloop_customfont"
        style="font-family:Helvetica,Arial;font-size:13px; color:
        rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br>
      </div>
      <p style="color:#000;">On May 2, 2014 at 11:53:25 AM, Eric
        Christopher (<a moz-do-not-send="true"
          href="mailto:echristo@gmail.com">echristo@gmail.com</a>)
        wrote:</p>
      <div>
        <blockquote type="cite" class="clean_bq" style="color: rgb(0, 0,
          0); font-family: Helvetica, Arial; font-size: 13px;
          font-style: normal; font-variant: normal; font-weight: normal;
          letter-spacing: normal; line-height: normal; orphans: auto;
          text-align: start; text-indent: 0px; text-transform: none;
          white-space: normal; widows: auto; word-spacing: 0px;
          -webkit-text-stroke-width: 0px; background-color: rgb(255,
          255, 255);"><span>
            <div>
              <div>On Wed, Apr 30, 2014 at 10:34 PM, Philip Reames<span
                  class="Apple-converted-space"> </span><br>
                <a class="moz-txt-link-rfc2396E" href="mailto:listmail@philipreames.com"><listmail@philipreames.com></a> wrote:<span
                  class="Apple-converted-space"> </span><br>
                > Andy - If you're not already following this
                closely, please start. We've<span
                  class="Apple-converted-space"> </span><br>
                > gotten into fairly fundamental questions of what a
                patchpoint does.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Filip,<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I think you've hit the nail on the head. What I'm
                thinking of as being<span class="Apple-converted-space"> </span><br>
                > patchpoints are not what you think they are. Part
                of that is that I've got<span
                  class="Apple-converted-space"> </span><br>
                > a local change which adds a very similar
                construction (called "statepoints"<span
                  class="Apple-converted-space"> </span><br>
                > for the moment), but I was trying to keep that
                separate. That also includes<span
                  class="Apple-converted-space"> </span><br>
                > a lot of GC semantics which are not under
                discussion currently. My<span
                  class="Apple-converted-space"> </span><br>
                > apologies if that experience bled over into this
                conversation and made<span class="Apple-converted-space"> </span><br>
                > things more confusing.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I will note that the documentation for patchpoint
                say explicitly the<span class="Apple-converted-space"> </span><br>
                > following:<span class="Apple-converted-space"> </span><br>
                > "The ‘llvm.experimental.patchpoint.*‘ intrinsics
                creates a function call to<span
                  class="Apple-converted-space"> </span><br>
                > the specified <target> and records the
                location of specified values in the<span
                  class="Apple-converted-space"> </span><br>
                > stack map."<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > My reading has always been that a patchpoint *that
                isn't patched* is simply<span
                  class="Apple-converted-space"> </span><br>
                > a call with a stackmap associated with it. To my
                reading, this can (and<span
                  class="Apple-converted-space"> </span><br>
                > did, and does) indicate my proposed usage would be
                legal.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                <br>
                I like the idea that the target can be assumed to be
                called. It makes<span class="Apple-converted-space"> </span><br>
                optimization of the call possible, etc. I think it's
                definitely worth<span class="Apple-converted-space"> </span><br>
                exploring before we lock down the patchpoint intrinsic.<span
                  class="Apple-converted-space"> </span></div>
            </div>
          </span></blockquote>
      </div>
      <p>I will actively oppose this.</p>
    </blockquote>
    I think it sounds like we need to split patchpoint into two cases. 
    I'm going to send a rough proposal for this later today.<br>
    <blockquote cite="mid:etPan.5363ea85.57fc4fbb.172db@dethklok.local"
      type="cite">
      <p>-Filip</p>
      <p><br>
      </p>
      <div>
        <blockquote type="cite" class="clean_bq" style="color: rgb(0, 0,
          0); font-family: Helvetica, Arial; font-size: 13px;
          font-style: normal; font-variant: normal; font-weight: normal;
          letter-spacing: normal; line-height: normal; orphans: auto;
          text-align: start; text-indent: 0px; text-transform: none;
          white-space: normal; widows: auto; word-spacing: 0px;
          -webkit-text-stroke-width: 0px; background-color: rgb(255,
          255, 255);"><span>
            <div>
              <div><br>
                <br>
                -eric<span class="Apple-converted-space"> </span><br>
                <br>
                > I will agree that I've confused the topic badly on
                the optimization front.<span
                  class="Apple-converted-space"> </span><br>
                > My "statepoint" isn't patchable, so a lot more
                optimizations are legal.<span
                  class="Apple-converted-space"> </span><br>
                > Sorry about that. To restate what I think you've
                been saying all along, the<span
                  class="Apple-converted-space"> </span><br>
                > optimizer can't make assumptions about what
                function is called by a<span
                  class="Apple-converted-space"> </span><br>
                > patchpoint because that might change based on later
                patching. Is this the<span class="Apple-converted-space"> </span><br>
                > key point you've been trying to make?<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I'm not objecting to separating "my patchpoint"
                from "your patchpoint".<span
                  class="Apple-converted-space"> </span><br>
                > Let's just hammer out the semantics of each first.
                :)<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Again, longer response to follow in a day or so. :)<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Philip<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > On 04/30/2014 10:09 PM, Filip Pizlo wrote:<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > On April 30, 2014 at 9:06:20 PM, Philip Reames
                (<a class="moz-txt-link-abbreviated" href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)<span
                  class="Apple-converted-space"> </span><br>
                > wrote:<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > On 04/29/2014 12:39 PM, Filip Pizlo wrote:<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > On April 29, 2014 at 11:27:06 AM, Philip Reames
                (<a class="moz-txt-link-abbreviated" href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)<span
                  class="Apple-converted-space"> </span><br>
                > wrote:<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > On 04/29/2014 10:44 AM, Filip Pizlo wrote:<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > LD;DR: Your desire to use trapping on x86 only
                further convinces me that<span
                  class="Apple-converted-space"> </span><br>
                > Michael's proposed intrinsics are the best way to
                go.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I'm still not convinced, but am not going to
                actively oppose it either. I'm<span
                  class="Apple-converted-space"> </span><br>
                > leery of designing a solution with major
                assumptions we don't have data to<span
                  class="Apple-converted-space"> </span><br>
                > backup.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I worry your assumptions about deoptimization are
                potentially unsound. But<span
                  class="Apple-converted-space"> </span><br>
                > I don't have data to actually show this (yet).<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I *think* I may have been unclear about my
                assumptions; in particular, my<span
                  class="Apple-converted-space"> </span><br>
                > claims with respect to deoptimization are probably
                more subtle than they<span class="Apple-converted-space"> </span><br>
                > appeared. WebKit can use LLVM and it has divisions
                and we do all possible<span
                  class="Apple-converted-space"> </span><br>
                > deoptimization/profiling/etc tricks for it, so this
                is grounded in<span class="Apple-converted-space"> </span><br>
                > experience. Forgive me if the rest of this e-mail
                contains a lecture on<span class="Apple-converted-space"> </span><br>
                > things that are obvious - I'll try to err on the
                side of clarity and<span class="Apple-converted-space"> </span><br>
                > completeness since this discussion is sufficiently
                dense that we run the<span class="Apple-converted-space"> </span><br>
                > risk of talking cross-purposes unless some baseline
                assumptions are<span class="Apple-converted-space"> </span><br>
                > established.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I think we're using the same terminology, but with
                slightly different sets<span
                  class="Apple-converted-space"> </span><br>
                > of assumptions. I'll point this out below where
                relevant.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Also, thanks for taking the time to expand. It help
                clarify the discussion<span
                  class="Apple-converted-space"> </span><br>
                > quite a bit.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I think we may be converging to an understanding of
                what you want versus<span class="Apple-converted-space"> </span><br>
                > what I want, and I think that there are some points
                - possibly unrelated to<span
                  class="Apple-converted-space"> </span><br>
                > division - that are worth clarifying. I think that
                the main difference is<span
                  class="Apple-converted-space"> </span><br>
                > that when I say "patchpoint", I am referring to a
                concrete intrinsic with<span
                  class="Apple-converted-space"> </span><br>
                > specific semantics that cannot change without
                breaking WebKit, while you are<span
                  class="Apple-converted-space"> </span><br>
                > using the term to refer to a broad concept, or
                rather, a class of<span class="Apple-converted-space"> </span><br>
                > as-yet-unimplemented intrinsics that share some of
                the same features with<span
                  class="Apple-converted-space"> </span><br>
                > patchpoints but otherwise have incompatible
                semantics.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Also, when I say that you wouldn't want to use the
                existing patchpoint to do<span
                  class="Apple-converted-space"> </span><br>
                > your trapping deopt, what I mean is that the
                performance of doing this would<span
                  class="Apple-converted-space"> </span><br>
                > suck for reasons that are not related to
                deoptimization or trapping. I'm<span
                  class="Apple-converted-space"> </span><br>
                > not claiming that deoptimization performs poorly
                (trust me, I know better)<span
                  class="Apple-converted-space"> </span><br>
                > or that trapping to deoptimize is bad (I've done
                this many, many times and I<span
                  class="Apple-converted-space"> </span><br>
                > know better). I'm saying that with the current
                patchpoint intrinsics in<span
                  class="Apple-converted-space"> </span><br>
                > LLVM, as they are currently specified and
                implemented, doing it would be a<span
                  class="Apple-converted-space"> </span><br>
                > bad idea because you'd have to compromise a bunch
                of other optimizations to<span
                  class="Apple-converted-space"> </span><br>
                > achieve it.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > You have essentially described new intrinsics that
                would make this less of a<span
                  class="Apple-converted-space"> </span><br>
                > bad idea and I am interested in your intrinsics, so
                I'll try to both respond<span
                  class="Apple-converted-space"> </span><br>
                > with why patchpoints don't currently give you what
                you want (and why simply<span
                  class="Apple-converted-space"> </span><br>
                > changing patchpoint semantics would be evil) and
                I'll also try to comment on<span
                  class="Apple-converted-space"> </span><br>
                > what I think of the intrinsic that you're
                effectively proposing. Long story<span
                  class="Apple-converted-space"> </span><br>
                > short, I think you should formally propose your
                intrinsic even if it's not<span
                  class="Apple-converted-space"> </span><br>
                > completely fleshed out. I think that it's an
                interesting capability and in<span
                  class="Apple-converted-space"> </span><br>
                > its most basic form, it is a simple variation of
                the current<span class="Apple-converted-space"> </span><br>
                > patchpoint/stackmap intrinsics.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > On April 29, 2014 at 10:09:49 AM, Philip Reames
                (<a class="moz-txt-link-abbreviated" href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)<span
                  class="Apple-converted-space"> </span><br>
                > wrote:<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > As the discussion has progressed and I've spent
                more time thinking about the<span
                  class="Apple-converted-space"> </span><br>
                > topic, I find myself less and less enthused about
                the current proposal. I'm<span
                  class="Apple-converted-space"> </span><br>
                > in full support of having idiomatic ways to express
                safe division, but I'm<span
                  class="Apple-converted-space"> </span><br>
                > starting to doubt that using an intrinsic is the
                right way at the moment.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > One case I find myself thinking about is how one
                would combine profiling<span
                  class="Apple-converted-space"> </span><br>
                > information and implicit div-by-zero/overflow
                checks with this proposal. I<span
                  class="Apple-converted-space"> </span><br>
                > don't really see a clean way. Ideally, for a "safe
                div" which never has the<span
                  class="Apple-converted-space"> </span><br>
                > exceptional paths taken, you'd like to completely
                do away with the control<span
                  class="Apple-converted-space"> </span><br>
                > flow entirely. (And rely on hardware traps
                w/exceptions instead.) I don't<span
                  class="Apple-converted-space"> </span><br>
                > really see a way to represent that type of
                construct given the current<span
                  class="Apple-converted-space"> </span><br>
                > proposal.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > This is a deeper problem and to solve it you'd need
                a solution to trapping<span
                  class="Apple-converted-space"> </span><br>
                > in general. Let's consider the case of Java. A Java
                program may want to<span class="Apple-converted-space"> </span><br>
                > catch the arithmetic exception due to divide by
                zero. How would you do this<span
                  class="Apple-converted-space"> </span><br>
                > with a trap in LLVM IR? Spill all state that is
                live at the catch? Use a<span
                  class="Apple-converted-space"> </span><br>
                > patchpoint for the entire division instruction?<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > We'd likely use something similar to a patchpoint.
                You'd need the "abstract<span
                  class="Apple-converted-space"> </span><br>
                > vm state" (which is not the compiled frame
                necessarily) available at the div<span
                  class="Apple-converted-space"> </span><br>
                > instruction. You could then re-enter the
                interpreter at the specified index<span
                  class="Apple-converted-space"> </span><br>
                > (part of the vm state). We have all most of these
                mechanisms in place.<span class="Apple-converted-space"> </span><br>
                > Ideally, you'd trigger a recompile and otherwise
                ensure re-entry into<span class="Apple-converted-space"> </span><br>
                > compiled code at the soonest possible moment.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > This requires a lot of runtime support, but we
                already have most of it<span
                  class="Apple-converted-space"> </span><br>
                > implemented for another compiler. From our
                perspective, the runtime<span
                  class="Apple-converted-space"> </span><br>
                > requirements are not a major blocker.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Right, you'll use a patchpoint. That's way more
                expensive than using a safe<span
                  class="Apple-converted-space"> </span><br>
                > division intrinsic with branches, because it's
                opaque to the optimizer.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > This statement is true at the moment, but it
                shouldn't be. I think this is<span
                  class="Apple-converted-space"> </span><br>
                > our fundamental difference in approach.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > You should be able to write something like:<span
                  class="Apple-converted-space"> </span><br>
                > i32 %res = invoke patchpoint (...
                x86_trapping_divide, a, b) normal_dest<span
                  class="Apple-converted-space"> </span><br>
                > invoke_dest<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > normal_dest:<span class="Apple-converted-space"> </span><br>
                > ;; use %res<span class="Apple-converted-space"> </span><br>
                > invoke_dest:<span class="Apple-converted-space"> </span><br>
                > landingpad<span class="Apple-converted-space"> </span><br>
                > ;; dispatch edge cases<span
                  class="Apple-converted-space"> </span><br>
                > ;; this could be unreachable code if you deopt this
                frame in the trap<span class="Apple-converted-space"> </span><br>
                > handler and jump directly to an interpreter or
                other bit of code<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I see. It sounds like you want a generalization of
                the "div.with.stackmap"<span
                  class="Apple-converted-space"> </span><br>
                > that I thought you wanted - you want to be able to
                wrap anything in a<span class="Apple-converted-space"> </span><br>
                > stackmap.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > The current patchpoint intrinsic does not do this,
                and you run the risk of<span
                  class="Apple-converted-space"> </span><br>
                > breaking existing semantics if you changed this.
                You'd probably break<span class="Apple-converted-space"> </span><br>
                > WebKit, which treats the call target of the
                patchpoint as nothing more than<span
                  class="Apple-converted-space"> </span><br>
                > a quirk - we always pass null. Also, the current
                patchpoint treats the<span class="Apple-converted-space"> </span><br>
                > callee as an i8* if I remember right and it would
                be super weird if all LLVM<span
                  class="Apple-converted-space"> </span><br>
                > phases had to interpret this i8* by unwrapping a
                possible bitcast to get to<span
                  class="Apple-converted-space"> </span><br>
                > a declared function that may be an intrinsic. Yuck!
                Basically, the call<span class="Apple-converted-space"> </span><br>
                > target of existing patchpoints is meant to be a
                kind of convenience feature<span
                  class="Apple-converted-space"> </span><br>
                > rather than the core of the mechanism.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I agree in principle that the intrinsic that you
                want would be a useful<span
                  class="Apple-converted-space"> </span><br>
                > intrinsic. But let's not call it a patchpoint for
                the purposes of this<span class="Apple-converted-space"> </span><br>
                > discussion, and let's not confuse the discussion by
                claiming (incorrectly)<span
                  class="Apple-converted-space"> </span><br>
                > that the existing patchpoint facility gives you
                what you want. It doesn't:<span
                  class="Apple-converted-space"> </span><br>
                > patchpoints are designed to make the call target
                opaque (hence the use of<span
                  class="Apple-converted-space"> </span><br>
                > i8*) and there shouldn't be a correlation between
                what the patchpoint does<span
                  class="Apple-converted-space"> </span><br>
                > at run-time and what the called function would have
                done. You could make<span class="Apple-converted-space"> </span><br>
                > the call target be null (like WebKit does) and the
                patchpoint should still<span
                  class="Apple-converted-space"> </span><br>
                > mean "this code can do anything" because the
                expectation is that the client<span
                  class="Apple-converted-space"> </span><br>
                > JIT will patch over it anyway.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Also, "patchpoint" would probably not be the right
                term for the intrinsic<span
                  class="Apple-converted-space"> </span><br>
                > that you want. I think that what you want is
                "call.with.stackmap". Or<span
                  class="Apple-converted-space"> </span><br>
                > maybe "stackmap.wrapper". Or just "stackmap" - I'd
                be OK, in principle,<span class="Apple-converted-space"> </span><br>
                > with changing the name of the current "stackmap"
                intrinsic to something that<span
                  class="Apple-converted-space"> </span><br>
                > reflects the fact that it's less of a stackmap than
                what you want.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > To summarize. A patchpoint's main purpose is that
                you can patch it with<span class="Apple-converted-space"> </span><br>
                > arbitrary code. The current "stackmap" means that
                you can patch it with<span class="Apple-converted-space"> </span><br>
                > arbitrary code and that patching may be destructive
                to a shadow of machine<span
                  class="Apple-converted-space"> </span><br>
                > code bytes, so it's really just like patchpoints -
                we could change its name<span
                  class="Apple-converted-space"> </span><br>
                > to "patchpoint.shadow" for example.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > If you were to propose such a stackmap intrinsic,
                then I think there could<span
                  class="Apple-converted-space"> </span><br>
                > be some ways of doing it that wouldn't be too
                terrible. Basically you want<span
                  class="Apple-converted-space"> </span><br>
                > something that is like a patchpoint in that it
                reports a stackmap via a side<span
                  class="Apple-converted-space"> </span><br>
                > channel, but unlike patchpoints, it doesn't allow
                arbitrary patching -<span class="Apple-converted-space"> </span><br>
                > instead the optimizer should be allowed to assume
                that the code within the<span
                  class="Apple-converted-space"> </span><br>
                > patchpoint will always do the same thing that the
                call target would have<span
                  class="Apple-converted-space"> </span><br>
                > done. There are downsides to truly doing this. For
                example, to make<span class="Apple-converted-space"> </span><br>
                > division efficient with such an intrinsic, you'd
                have to do something that<span
                  class="Apple-converted-space"> </span><br>
                > is somewhat worse than just recognizing intrinsics
                in the optimizer - you'd<span
                  class="Apple-converted-space"> </span><br>
                > have to first recognize a call to your "stackmap
                wrapper" intrinsic and then<span
                  class="Apple-converted-space"> </span><br>
                > observe that its call target argument is in turn
                another intrinsic. To me<span
                  class="Apple-converted-space"> </span><br>
                > personally, that's kind of yucky, but I won't deny
                that it could be useful.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > As to the use of invoke: I don't believe that the
                use of invoke versus my<span
                  class="Apple-converted-space"> </span><br>
                > suggested "branch on a trap predicate" idea are
                different in any truly<span
                  class="Apple-converted-space"> </span><br>
                > meaningful way. I buy that either would work.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > A patchpoint should not require any excess
                spilling. If values are live in<span
                  class="Apple-converted-space"> </span><br>
                > registers, that should be reflected in the stack
                map. (I do not know if<span
                  class="Apple-converted-space"> </span><br>
                > this is the case for patchpoint at the moment or
                not.)<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Patchpoints do not require spilling.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > My point was that with existing patchpoints, you
                *either* use a patchpoint<span
                  class="Apple-converted-space"> </span><br>
                > for the entire division which makes the division
                opaque to the optimizer -<span
                  class="Apple-converted-space"> </span><br>
                > because a patchpoint means "this code can do
                anything" - *or* you could<span
                  class="Apple-converted-space"> </span><br>
                > spill stuff to the stack prior to your trapping
                division intrinsic, since<span
                  class="Apple-converted-space"> </span><br>
                > spilling is something that you could do as a
                workaround if you didn't have a<span
                  class="Apple-converted-space"> </span><br>
                > patchpoint.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > The reason why I brought up spilling at all is that
                I suspect that spilling<span
                  class="Apple-converted-space"> </span><br>
                > all state to the stack might be cheaper - for some
                systems - than turning<span
                  class="Apple-converted-space"> </span><br>
                > the division into a patchpoint. Turning the
                division into a patchpoint is<span
                  class="Apple-converted-space"> </span><br>
                > horrendously brutal - the patchpoint looks like it
                clobbers the heap (which<span
                  class="Apple-converted-space"> </span><br>
                > a division doesn't do), has to execute (a division
                is an obvious DCE<span class="Apple-converted-space"> </span><br>
                > candidate), cannot be hoisted (hoisting divisions
                is awesome), etc. Perhaps<span
                  class="Apple-converted-space"> </span><br>
                > most importantly, though, a patchpoint doesn't tell
                LLVM that you're *doing<span
                  class="Apple-converted-space"> </span><br>
                > a division* - so all constant folding, strenght
                reduction, and algebraic<span
                  class="Apple-converted-space"> </span><br>
                > reasoning flies out the window. On the other hand,
                spilling all state to<span class="Apple-converted-space"> </span><br>
                > the stack is an arguably sound and performant
                solution to a lot of VM<span
                  class="Apple-converted-space"> </span><br>
                > problems. I've seen JVM implementations that ensure
                that there is always a<span
                  class="Apple-converted-space"> </span><br>
                > copy of state on the stack at some critical points,
                just because it makes<span class="Apple-converted-space"> </span><br>
                > loads of stuff simpler (debugging, profiling, GC,
                and of course deopt). I<span
                  class="Apple-converted-space"> </span><br>
                > personally prefer to stay away from such a strategy
                because it's not free.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > On the other hand, branches can be cheap. A branch
                on a divide is cheaper<span
                  class="Apple-converted-space"> </span><br>
                > than not being able to optimize the divide.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > The Value called by a patchpoint should participate
                in optimization<span class="Apple-converted-space"> </span><br>
                > normally.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I agree that you could have a different intrinsic
                that behaves like this.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > We really want the patchpoint part of the call to
                be supplemental. It<span class="Apple-converted-space"> </span><br>
                > should still be a call. It should be constant
                propagated, transformed,<span
                  class="Apple-converted-space"> </span><br>
                > etc.. This is not the case currently. I've got a
                couple of off the wall<span
                  class="Apple-converted-space"> </span><br>
                > ideas for improving the current status, but I'll
                agree this is a hardish<span
                  class="Apple-converted-space"> </span><br>
                > problem.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > It should be legal to use a patchpoint in an
                invoke. It's an ABI issue of<span
                  class="Apple-converted-space"> </span><br>
                > how the invoke path gets invoked. (i.e. side tables
                for the runtime to<span class="Apple-converted-space"> </span><br>
                > lookup, etc..) This is not possible today, and
                probably requires a fair<span
                  class="Apple-converted-space"> </span><br>
                > amount of work. Some of it, I've already done and
                will be sharing shortly.<span
                  class="Apple-converted-space"> </span><br>
                > Other parts, I haven't even thought about.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Right, it's significantly more complex than either
                the existing patchpoints<span
                  class="Apple-converted-space"> </span><br>
                > or Michael's proposed safe.div.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > If you didn't want to use the trapping semantics,
                you'd insert dedicated<span
                  class="Apple-converted-space"> </span><br>
                > control flow _before_ the divide. This would allow
                normal optimization of<span
                  class="Apple-converted-space"> </span><br>
                > the control flow.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Notes:<span class="Apple-converted-space"> </span><br>
                > 1) This might require a new PATCHPOINT pseudo op in
                the backend. Haven't<span class="Apple-converted-space"> </span><br>
                > thought much about that yet.<span
                  class="Apple-converted-space"> </span><br>
                > 2) I *think* your current intrinsic could be
                translated into something like<span
                  class="Apple-converted-space"> </span><br>
                > this. (Leaving aside the question of where the
                deopt state comes from.) In<span
                  class="Apple-converted-space"> </span><br>
                > fact, the more I look at this, the less difference
                I actually see between<span
                  class="Apple-converted-space"> </span><br>
                > the approaches.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > In a lot of languages, a divide produces some
                result even in the exceptional<span
                  class="Apple-converted-space"> </span><br>
                > case and this result requires effectively
                deoptimizing since the resut won't<span
                  class="Apple-converted-space"> </span><br>
                > be the one you would have predicted (double instead
                of int, or BigInt<span class="Apple-converted-space"> </span><br>
                > instead of small int), which sort of means that if
                the CPU exception occurs<span
                  class="Apple-converted-space"> </span><br>
                > you have to be able to reconstruct all state. A
                patchpoint could do this,<span
                  class="Apple-converted-space"> </span><br>
                > and so could spilling all state to the stack before
                the divide - but both<span class="Apple-converted-space"> </span><br>
                > are very heavy hammers that are sure to be more
                expensive than just doing a<span
                  class="Apple-converted-space"> </span><br>
                > branch.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > This isn't necessarily as expensive as you might
                believe. I'd recommend<span
                  class="Apple-converted-space"> </span><br>
                > reading the Graal project papers on this topic.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Whether deopt or branching is more profitable *in
                this case*, I can't easily<span
                  class="Apple-converted-space"> </span><br>
                > say. I'm not yet to the point of being able to run
                that experiment. We can<span
                  class="Apple-converted-space"> </span><br>
                > argue about what "should" be better all we want,
                but real performance data<span
                  class="Apple-converted-space"> </span><br>
                > is the only way to truly know.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > My point may have been confusing. I know that
                deoptimization is cheap and<span
                  class="Apple-converted-space"> </span><br>
                > WebKit uses it everywhere, including division
                corner cases, if profiling<span
                  class="Apple-converted-space"> </span><br>
                > tells us that it's profitable to do so (which it
                does, in the common case).<span
                  class="Apple-converted-space"> </span><br>
                > WebKit is a heavy user of deoptimization in
                general, so you don't need to<span
                  class="Apple-converted-space"> </span><br>
                > convince me that it's worth it.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Acknowledged.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Note that I want *both* deopt *and* branching,
                because in this case, a<span
                  class="Apple-converted-space"> </span><br>
                > branch is the fastest overall way of detecting when
                to deopt. In the<span class="Apple-converted-space"> </span><br>
                > future, I will want to implement the deopt in terms
                of branching, and when<span
                  class="Apple-converted-space"> </span><br>
                > we do this, I believe that the most sound and
                performat approach would be<span
                  class="Apple-converted-space"> </span><br>
                > using Michael's intrinsics. This is subtle and I'll
                try to explain why it's<span
                  class="Apple-converted-space"> </span><br>
                > the case.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > The point is that you wouldn't want to do
                deoptimization by spilling state<span
                  class="Apple-converted-space"> </span><br>
                > on the main path or by using a patchpoint for the
                main path of the division.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > This is the main point I disagree with. I don't
                believe that having a<span class="Apple-converted-space"> </span><br>
                > patchpoint on the main path should be any more
                expensive then the original<span
                  class="Apple-converted-space"> </span><br>
                > call. (see above)<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > The reason why the patchpoint is expensive is that
                if you use a patchpoint<span
                  class="Apple-converted-space"> </span><br>
                > to implement a division then the optimizer won't be
                allowed to assume that<span
                  class="Apple-converted-space"> </span><br>
                > it's a division, because the whole point of
                "patchpoint" is to tell the<span
                  class="Apple-converted-space"> </span><br>
                > optimizer to piss off.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Worth noting explicitly: I'm assuming that all of
                your deopt state would<span
                  class="Apple-converted-space"> </span><br>
                > already be available for other purposes in nearby
                code. It's on the stack<span
                  class="Apple-converted-space"> </span><br>
                > or in registers. I'm assuming that by adding the
                deopt point, you are not<span
                  class="Apple-converted-space"> </span><br>
                > radically changing the set of computations which
                need to be done. If that's<span
                  class="Apple-converted-space"> </span><br>
                > not the case, you should avoid deopt and instead
                just inline the slow paths<span
                  class="Apple-converted-space"> </span><br>
                > with explicit checks.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Yes, of course it is. That's not the issue.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I'll note that given your assumptions about the
                cost of a patchpoint, the<span
                  class="Apple-converted-space"> </span><br>
                > rest of your position makes a lot more sense. :) As
                I spelled out above, I<span
                  class="Apple-converted-space"> </span><br>
                > believe this cost is not fundamental.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > You don't want the common path of executing the
                division to involve a<span class="Apple-converted-space"> </span><br>
                > patchpoint instruction, although using a patchpoint
                or stackmap to implement<span
                  class="Apple-converted-space"> </span><br>
                > deoptimization on the failing path is great:<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Good: if (division would fail) { call
                @patchpoint(all of my state) } else {<span
                  class="Apple-converted-space"> </span><br>
                > result = a / b }<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Given your cost assumptions, I'd agree.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Not my cost assumptions. The reason why this is
                better is that the division<span
                  class="Apple-converted-space"> </span><br>
                > is expressed in LLVM IR so that LLVM can do useful
                things to it - like<span class="Apple-converted-space"> </span><br>
                > eliminate it, for example.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Bad: call @patchpoint(all of my state) // patch
                with a divide instruction -<span
                  class="Apple-converted-space"> </span><br>
                > bad because the optimizer has no clue what you're
                doing and assumes the very<span
                  class="Apple-converted-space"> </span><br>
                > worst<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Yuck. Agreed.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > To be clear, this is what you're proposing - except
                that you're assuming<span class="Apple-converted-space"> </span><br>
                > that LLVM will know that you've patched a division
                because you're expecting<span
                  class="Apple-converted-space"> </span><br>
                > the call target to have semantic meaning. Or,
                rather, you're expecting that<span
                  class="Apple-converted-space"> </span><br>
                > you can make the contents of the patchpoint be a
                division by having the call<span
                  class="Apple-converted-space"> </span><br>
                > target be a division intrinsic. In the current
                implementation and as it is<span
                  class="Apple-converted-space"> </span><br>
                > currently specified, the call target has no meaning
                and so you get the yuck<span
                  class="Apple-converted-space"> </span><br>
                > that I'm showing.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Worse: spill all state to the stack; call
                @trapping.div(a, b) // the spills<span
                  class="Apple-converted-space"> </span><br>
                > will hurt you far more than a branch, so this
                should be avoided<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I'm assuming this is an explicit spill rather than
                simply recording a stack<span
                  class="Apple-converted-space"> </span><br>
                > map *at the div*. If so, agreed.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I suppose we could imagine a fourth option that
                involves a patchpoint to<span
                  class="Apple-converted-space"> </span><br>
                > pick up the state and a trapping divide instrinsic.
                But a trapping divide<span class="Apple-converted-space"> </span><br>
                > intrinsic alone is not enough. Consider this:<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > result = call @trapping.div(a, b); call
                @stackmap(all of my state)<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > As soon as these are separate instructions, you
                have no guarantees that the<span
                  class="Apple-converted-space"> </span><br>
                > state that the stackmap reports is sound for the
                point at which the div<span
                  class="Apple-converted-space"> </span><br>
                > would trap.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > This is the closest to what I'd propose, except
                that the two calls would be<span
                  class="Apple-converted-space"> </span><br>
                > merged into a single patchpoint. Isn't the entire
                point of a patchpoint to<span
                  class="Apple-converted-space"> </span><br>
                > record the stack map for a call?<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > No. It would be bad if that's what the
                documentation says. That's not at<span
                  class="Apple-converted-space"> </span><br>
                > all how WebKit uses it or probably any IC client
                would use it.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Patchpoints are designed to be inline assembly on
                steroids. They're there<span
                  class="Apple-converted-space"> </span><br>
                > to allow the client JIT to tell LLVM to piss off.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > (Well, ignoring the actual patching part..) Why not
                write this as:<span class="Apple-converted-space"> </span><br>
                > patchpoint(..., trapping.div, a, b);<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Is there something I'm missing here?<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Just to note: I fully agree that the two call
                proposal is unsound and should<span
                  class="Apple-converted-space"> </span><br>
                > be avoided.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > So, the division itself shouldn't be a trapping
                instruction and instead you<span
                  class="Apple-converted-space"> </span><br>
                > want to detect the bad case with a branch.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > To be clear:<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > - Whether you use deoptimization for division or
                anything else - like WebKit<span
                  class="Apple-converted-space"> </span><br>
                > has done since before any of the Graal papers were
                written - is mostly<span class="Apple-converted-space"> </span><br>
                > unrelated to how you represent the division, unless
                you wanted to add a new<span
                  class="Apple-converted-space"> </span><br>
                > intrinsic that is like a
                trapping-division-with-stackmap:<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > result = call @trapping.div.with.stackmap(a, b, ...
                all of my state ...)<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Now, maybe you do want such an intrinsic, in which
                case you should propose<span
                  class="Apple-converted-space"> </span><br>
                > it!<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Given what we already have with patchpoints, I
                don't think a merged<span class="Apple-converted-space"> </span><br>
                > intrinsic is necessary. (See above). I believe we
                have all the parts to<span class="Apple-converted-space"> </span><br>
                > build this solution, and that - in theory - they
                should compose neatly.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > p.s. The bits I was referring to was not deopt per
                se. It was particularly<span
                  class="Apple-converted-space"> </span><br>
                > which set of deopt state you used for each deopt
                point. That's a bit of<span
                  class="Apple-converted-space"> </span><br>
                > tangent for the rest of the discussion now though.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > The reason why I haven't proposed it is that I
                think that long-term, the<span
                  class="Apple-converted-space"> </span><br>
                > currently proposed intrinsics are a better path to
                getting the trapping<span class="Apple-converted-space"> </span><br>
                > optimizations. See my previous mail, where I show
                how we could tell LLVM<span
                  class="Apple-converted-space"> </span><br>
                > what the failing path is (which may have
                deoptimization code that uses a<span
                  class="Apple-converted-space"> </span><br>
                > stackmap or whatever), what the trapping predicate
                is (it comes from the<span class="Apple-converted-space"> </span><br>
                > safe.div intrinsic), and the fact that trapping is
                wise (branch weights).<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > - If you want to do the deoptimization with a trap,
                then your only choice<span class="Apple-converted-space"> </span><br>
                > currently is to use a patchpoint for the main path
                of the division. This<span class="Apple-converted-space"> </span><br>
                > will be slower than using a branch to an OSR exit
                basic block, because<span class="Apple-converted-space"> </span><br>
                > you're making the division itself opaque to the
                optimizer (bad) just to get<span
                  class="Apple-converted-space"> </span><br>
                > rid of a branch (which was probably cheap to begin
                with).<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Hence, what you want to do - one way or another,
                regardless of whether this<span
                  class="Apple-converted-space"> </span><br>
                > proposed intrinsic is added - is to branch on the
                corner case condition, and<span
                  class="Apple-converted-space"> </span><br>
                > have the slow case of the branch go to a basic
                block that deoptimizes.<span
                  class="Apple-converted-space"> </span><br>
                > Unless of course you have profiling that says that
                the case does happen<span class="Apple-converted-space"> </span><br>
                > often, in which case you can have that basic block
                handle the corner case<span
                  class="Apple-converted-space"> </span><br>
                > inline without leaving optimized code (FWIW, we do
                have such paths in WebKit<span
                  class="Apple-converted-space"> </span><br>
                > and they are useful).<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > So the question for me is whether the branching
                involves explicit control<span
                  class="Apple-converted-space"> </span><br>
                > flow or is hidden inside an intrinsic. I prefer for
                it to be within an<span class="Apple-converted-space"> </span><br>
                > intrinsic because it:<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > - allows the optimizer to do more interesting
                things in the common cases,<span
                  class="Apple-converted-space"> </span><br>
                > like hoisting the entire division.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > - will give us a clearer path for implementing
                trapping optimizations in the<span
                  class="Apple-converted-space"> </span><br>
                > future.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > - is an immediate win on ARM.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I'd be curious to hear what specific idea you have
                about how to implement<span
                  class="Apple-converted-space"> </span><br>
                > trap-based deoptimization with your trapping
                division intrinsic for x86 -<span
                  class="Apple-converted-space"> </span><br>
                > maybe it's different from the two "bad" idioms I
                showed above.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I hope my explanation above helps. If not, ask, and
                I'll try to explain<span class="Apple-converted-space"> </span><br>
                > more clearly.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I think I understand it. I think that the only
                issue is that:<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > - Patchpoints currently don't do what you want.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > - If you made patchpoints do what you want then
                you'd break WebKit - not to<span
                  class="Apple-converted-space"> </span><br>
                > mention anyone who wants to use them for inline
                caches.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > So it seems like you want a new intrinsic. You
                should officially propose<span
                  class="Apple-converted-space"> </span><br>
                > this new intrinsic, particularly since the core
                semantic differences are not<span
                  class="Apple-converted-space"> </span><br>
                > so great from what we have now. OTOH, if you truly
                believe that patchpoints<span
                  class="Apple-converted-space"> </span><br>
                > should just be changed to your semantics in a way
                that does break WebKit,<span
                  class="Apple-converted-space"> </span><br>
                > then that's probably also something you should get
                off your chest. ;-)<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > One point just for clarity; I don't believe this
                effects the conclusions of<span
                  class="Apple-converted-space"> </span><br>
                > our discussion so far. I'm also fairly sure that
                you (Filip) are aware of<span
                  class="Apple-converted-space"> </span><br>
                > this, but want to spell it out for other readers.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > You seem to be assuming that compiled code needs to
                explicitly branch to a<span
                  class="Apple-converted-space"> </span><br>
                > point where deopt state is known to exit a compiled
                frame.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > This is a slightly unclear characterization of my
                assumptions. Our JIT does<span
                  class="Apple-converted-space"> </span><br>
                > deoptimization without explicit branches for many,
                many things. You should<span
                  class="Apple-converted-space"> </span><br>
                > look at it some time, it's pretty fancy. ;-)<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Worth noting is that you can also exit a compiled
                frame on a trap (without<span
                  class="Apple-converted-space"> </span><br>
                > an explicitly condition/branch!) if the deopt state
                is known at the point<span class="Apple-converted-space"> </span><br>
                > you take the trap. This "exit frame on trap"
                behavior shows up with null<span
                  class="Apple-converted-space"> </span><br>
                > pointer exceptions as well. I'll note that both
                compilers in OpenJDK<span class="Apple-converted-space"> </span><br>
                > support some combination of "exit-on-trap"
                conditions for division and null<span
                  class="Apple-converted-space"> </span><br>
                > dereferences. The two differ on exactly what
                strategies they use in each<span
                  class="Apple-converted-space"> </span><br>
                > case though. :)<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Yeah, and I've also implemented VMs that do this -
                and I endorse the basic<span
                  class="Apple-converted-space"> </span><br>
                > idea. I know what you want, and my only point is
                that the existing<span class="Apple-converted-space"> </span><br>
                > patchpoints only give you this if you're willing to
                make a huge compromise:<span
                  class="Apple-converted-space"> </span><br>
                > namely, that you're willing to make the division
                (or heap load for the null<span
                  class="Apple-converted-space"> </span><br>
                > case) completely opaque to the compiler to the
                point that GVN, LICM, SCCP,<span
                  class="Apple-converted-space"> </span><br>
                > and all algebraic reasoning have to give up on
                optimizing it. The point of<span
                  class="Apple-converted-space"> </span><br>
                > using LLVM is that it can optimize code. It can
                optimize branches and<span class="Apple-converted-space"> </span><br>
                > divisions pretty well. So, eliminating an explicit
                branch by replacing it<span
                  class="Apple-converted-space"> </span><br>
                > with a construct that appears opaque to the
                optimizer is not a smart<span
                  class="Apple-converted-space"> </span><br>
                > trade-off.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > You could add a new intrinsic that, like
                patchpoints, records the layout of<span
                  class="Apple-converted-space"> </span><br>
                > state in a side-table, but that is used as a kind
                of wrapper for operations<span
                  class="Apple-converted-space"> </span><br>
                > that LLVM understands. This may or may not be hairy
                - you seem to have sort<span
                  class="Apple-converted-space"> </span><br>
                > of acknowledged that it's got some complexity and
                I've also pointed out some<span
                  class="Apple-converted-space"> </span><br>
                > possible issues. If this is something that you
                want, you should propose it<span
                  class="Apple-converted-space"> </span><br>
                > so that others know what you're talking about. One
                danger of how we're<span class="Apple-converted-space"> </span><br>
                > discussing this right now is that you're
                overloading patchpoints to mean the<span
                  class="Apple-converted-space"> </span><br>
                > thing you want them to mean rather than what they
                actually mean, which makes<span
                  class="Apple-converted-space"> </span><br>
                > it seem like we don't need Michael's intrinsics on
                the grounds that<span class="Apple-converted-space"> </span><br>
                > patchpoints already offer a solution. They don't
                already offer a solution<span
                  class="Apple-converted-space"> </span><br>
                > precisely because patchpoints don't do what your
                intrinsics would do.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I'm not really arguing that either scheme is
                "better" in all cases. I'm<span
                  class="Apple-converted-space"> </span><br>
                > simply arguing that we should support both and
                allow optimization and tuning<span
                  class="Apple-converted-space"> </span><br>
                > between them. As far as I can tell, you seem to be
                assuming that an<span class="Apple-converted-space"> </span><br>
                > explicit branch to known exit point is always
                superior.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Ok, back to the topic at hand...<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > With regards to the current proposal, I'm going to
                take a step back. You<span class="Apple-converted-space"> </span><br>
                > guys seem to have already looked in this in a fair
                amount of depth. I'm not<span
                  class="Apple-converted-space"> </span><br>
                > necessarily convinced you've come to the best
                solution, but at some point,<span
                  class="Apple-converted-space"> </span><br>
                > we need to make forward progress. What you have is
                clearly better than<span class="Apple-converted-space"> </span><br>
                > nothing.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Please go ahead and submit your current approach.
                We can come back and<span class="Apple-converted-space"> </span><br>
                > revise later if we really need to.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I do request the following changes:<span
                  class="Apple-converted-space"> </span><br>
                > - Mark it clearly as experimental.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > - Either don't specify the value computed in the
                edge cases, or allow those<span
                  class="Apple-converted-space"> </span><br>
                > values to be specified as constant arguments to the
                call. This allows<span class="Apple-converted-space"> </span><br>
                > efficient lowering to x86's div instruction if you
                want to make use of the<span
                  class="Apple-converted-space"> </span><br>
                > trapping semantics.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Once again: how would you use this to get trapping
                semantics without<span class="Apple-converted-space"> </span><br>
                > throwing all of LLVM's optimizations out the
                window, in the absence of the<span
                  class="Apple-converted-space"> </span><br>
                > kind of patchpoint-like intrinsic that you want? I
                ask just to make sure<span class="Apple-converted-space"> </span><br>
                > that we're on the same page.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Finally, as for performance data, which part of
                this do you want performance<span
                  class="Apple-converted-space"> </span><br>
                > data for? I concede that I don't have performance
                data for using Michael's<span
                  class="Apple-converted-space"> </span><br>
                > new intrinsic. Part of what the intrinsic
                accomplishes is it gives a less<span
                  class="Apple-converted-space"> </span><br>
                > ugly way of doing something that is already
                possible with target intrinsics<span
                  class="Apple-converted-space"> </span><br>
                > on ARM. I think it would be great if you could get
                those semantics - along<span
                  class="Apple-converted-space"> </span><br>
                > with a known-good implementation - on other
                architectures as well.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I would be very interested in seeing data comparing
                two schemes:<span class="Apple-converted-space"> </span><br>
                > - Explicit control flow emited by the frontend<span
                  class="Apple-converted-space"> </span><br>
                > - The safe.div intrinsic emitted by the frontend,
                desugared in CodeGenPrep<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > My strong suspicion is that each would preform well
                in some cases and not in<span
                  class="Apple-converted-space"> </span><br>
                > others. At least on x86. Since the edge-checks are
                essentially free on<span class="Apple-converted-space"> </span><br>
                > ARM, the second scheme would probably be strictly
                superior there.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > I am NOT asking that we block submission on this
                data however.<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > But this discussion has also involved suggestions
                that we should use<span class="Apple-converted-space"> </span><br>
                > trapping to implement deoptimization, and the main
                point of my message is to<span
                  class="Apple-converted-space"> </span><br>
                > strongly argue against anything like this given the
                current state of<span class="Apple-converted-space"> </span><br>
                > trapping semantics and how patchpoints work. My
                point is that using traps<span
                  class="Apple-converted-space"> </span><br>
                > for division corner cases would either be unsound
                (see the stackmap after<span
                  class="Apple-converted-space"> </span><br>
                > the trap, above), or would require you to do things
                that are obviously<span class="Apple-converted-space"> </span><br>
                > inefficient. If you truly believe that the branch
                to detect division slow<span
                  class="Apple-converted-space"> </span><br>
                > paths is more expensive than spilling all bytecode
                state to the stack or<span class="Apple-converted-space"> </span><br>
                > using a patchpoint for the division, then I could
                probably hack something up<span
                  class="Apple-converted-space"> </span><br>
                > in WebKit to show you the performance implications.
                (Or you could do it<span class="Apple-converted-space"> </span><br>
                > yourself, the code is open source...)<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > In a couple of months, I'll probably have the
                performance data to discuss<span
                  class="Apple-converted-space"> </span><br>
                > this for real. When that happens, let's pick this
                up and continue the<span class="Apple-converted-space"> </span><br>
                > debate. Alternatively, if you want to chat this
                over more with a beer in<span
                  class="Apple-converted-space"> </span><br>
                > hand at the social next week, let me know. In the
                meantime, let's not stall<span
                  class="Apple-converted-space"> </span><br>
                > the current proposal any more.<span
                  class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                > Philip<span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
                ><span class="Apple-converted-space"> </span><br>
              </div>
            </div>
          </span></blockquote>
      </div>
    </blockquote>
    <br>
  </body>
</html>