<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <div class="moz-cite-prefix">On 05/02/2014 11:57 AM, Filip Pizlo

      wrote:<br>

    </div>

    <blockquote cite="mid:etPan.5363ea85.57fc4fbb.172db@dethklok.local"

      type="cite">

      <style>body{font-family:Helvetica,Arial;font-size:13px}</style>

      <div id="bloop_customfont"

        style="font-family:Helvetica,Arial;font-size:13px; color:

        rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br>

      </div>

      <p style="color:#000;">On May 2, 2014 at 11:53:25 AM, Eric

        Christopher (<a moz-do-not-send="true"

          href="mailto:echristo@gmail.com">echristo@gmail.com</a>)

        wrote:</p>

      <div>

        <blockquote type="cite" class="clean_bq" style="color: rgb(0, 0,

          0); font-family: Helvetica, Arial; font-size: 13px;

          font-style: normal; font-variant: normal; font-weight: normal;

          letter-spacing: normal; line-height: normal; orphans: auto;

          text-align: start; text-indent: 0px; text-transform: none;

          white-space: normal; widows: auto; word-spacing: 0px;

          -webkit-text-stroke-width: 0px; background-color: rgb(255,

          255, 255);"><span>

            <div>

              <div>On Wed, Apr 30, 2014 at 10:34 PM, Philip Reames<span

                  class="Apple-converted-space"> </span><br>

                <a class="moz-txt-link-rfc2396E" href="mailto:listmail@philipreames.com"><listmail@philipreames.com></a> wrote:<span

                  class="Apple-converted-space"> </span><br>

                > Andy - If you're not already following this

                closely, please start. We've<span

                  class="Apple-converted-space"> </span><br>

                > gotten into fairly fundamental questions of what a

                patchpoint does.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Filip,<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I think you've hit the nail on the head. What I'm

                thinking of as being<span class="Apple-converted-space"> </span><br>

                > patchpoints are not what you think they are. Part

                of that is that I've got<span

                  class="Apple-converted-space"> </span><br>

                > a local change which adds a very similar

                construction (called "statepoints"<span

                  class="Apple-converted-space"> </span><br>

                > for the moment), but I was trying to keep that

                separate. That also includes<span

                  class="Apple-converted-space"> </span><br>

                > a lot of GC semantics which are not under

                discussion currently. My<span

                  class="Apple-converted-space"> </span><br>

                > apologies if that experience bled over into this

                conversation and made<span class="Apple-converted-space"> </span><br>

                > things more confusing.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I will note that the documentation for patchpoint

                say explicitly the<span class="Apple-converted-space"> </span><br>

                > following:<span class="Apple-converted-space"> </span><br>

                > "The ‘llvm.experimental.patchpoint.*‘ intrinsics

                creates a function call to<span

                  class="Apple-converted-space"> </span><br>

                > the specified <target> and records the

                location of specified values in the<span

                  class="Apple-converted-space"> </span><br>

                > stack map."<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > My reading has always been that a patchpoint *that

                isn't patched* is simply<span

                  class="Apple-converted-space"> </span><br>

                > a call with a stackmap associated with it. To my

                reading, this can (and<span

                  class="Apple-converted-space"> </span><br>

                > did, and does) indicate my proposed usage would be

                legal.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                <br>

                I like the idea that the target can be assumed to be

                called. It makes<span class="Apple-converted-space"> </span><br>

                optimization of the call possible, etc. I think it's

                definitely worth<span class="Apple-converted-space"> </span><br>

                exploring before we lock down the patchpoint intrinsic.<span

                  class="Apple-converted-space"> </span></div>

            </div>

          </span></blockquote>

      </div>

      <p>I will actively oppose this.</p>

    </blockquote>

    I think it sounds like we need to split patchpoint into two cases. 

    I'm going to send a rough proposal for this later today.<br>

    <blockquote cite="mid:etPan.5363ea85.57fc4fbb.172db@dethklok.local"

      type="cite">

      <p>-Filip</p>

      <p><br>

      </p>

      <div>

        <blockquote type="cite" class="clean_bq" style="color: rgb(0, 0,

          0); font-family: Helvetica, Arial; font-size: 13px;

          font-style: normal; font-variant: normal; font-weight: normal;

          letter-spacing: normal; line-height: normal; orphans: auto;

          text-align: start; text-indent: 0px; text-transform: none;

          white-space: normal; widows: auto; word-spacing: 0px;

          -webkit-text-stroke-width: 0px; background-color: rgb(255,

          255, 255);"><span>

            <div>

              <div><br>

                <br>

                -eric<span class="Apple-converted-space"> </span><br>

                <br>

                > I will agree that I've confused the topic badly on

                the optimization front.<span

                  class="Apple-converted-space"> </span><br>

                > My "statepoint" isn't patchable, so a lot more

                optimizations are legal.<span

                  class="Apple-converted-space"> </span><br>

                > Sorry about that. To restate what I think you've

                been saying all along, the<span

                  class="Apple-converted-space"> </span><br>

                > optimizer can't make assumptions about what

                function is called by a<span

                  class="Apple-converted-space"> </span><br>

                > patchpoint because that might change based on later

                patching. Is this the<span class="Apple-converted-space"> </span><br>

                > key point you've been trying to make?<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I'm not objecting to separating "my patchpoint"

                from "your patchpoint".<span

                  class="Apple-converted-space"> </span><br>

                > Let's just hammer out the semantics of each first.

                :)<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Again, longer response to follow in a day or so. :)<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Philip<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > On 04/30/2014 10:09 PM, Filip Pizlo wrote:<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > On April 30, 2014 at 9:06:20 PM, Philip Reames

                (<a class="moz-txt-link-abbreviated" href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)<span

                  class="Apple-converted-space"> </span><br>

                > wrote:<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > On 04/29/2014 12:39 PM, Filip Pizlo wrote:<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > On April 29, 2014 at 11:27:06 AM, Philip Reames

                (<a class="moz-txt-link-abbreviated" href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)<span

                  class="Apple-converted-space"> </span><br>

                > wrote:<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > On 04/29/2014 10:44 AM, Filip Pizlo wrote:<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > LD;DR: Your desire to use trapping on x86 only

                further convinces me that<span

                  class="Apple-converted-space"> </span><br>

                > Michael's proposed intrinsics are the best way to

                go.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I'm still not convinced, but am not going to

                actively oppose it either. I'm<span

                  class="Apple-converted-space"> </span><br>

                > leery of designing a solution with major

                assumptions we don't have data to<span

                  class="Apple-converted-space"> </span><br>

                > backup.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I worry your assumptions about deoptimization are

                potentially unsound. But<span

                  class="Apple-converted-space"> </span><br>

                > I don't have data to actually show this (yet).<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I *think* I may have been unclear about my

                assumptions; in particular, my<span

                  class="Apple-converted-space"> </span><br>

                > claims with respect to deoptimization are probably

                more subtle than they<span class="Apple-converted-space"> </span><br>

                > appeared. WebKit can use LLVM and it has divisions

                and we do all possible<span

                  class="Apple-converted-space"> </span><br>

                > deoptimization/profiling/etc tricks for it, so this

                is grounded in<span class="Apple-converted-space"> </span><br>

                > experience. Forgive me if the rest of this e-mail

                contains a lecture on<span class="Apple-converted-space"> </span><br>

                > things that are obvious - I'll try to err on the

                side of clarity and<span class="Apple-converted-space"> </span><br>

                > completeness since this discussion is sufficiently

                dense that we run the<span class="Apple-converted-space"> </span><br>

                > risk of talking cross-purposes unless some baseline

                assumptions are<span class="Apple-converted-space"> </span><br>

                > established.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I think we're using the same terminology, but with

                slightly different sets<span

                  class="Apple-converted-space"> </span><br>

                > of assumptions. I'll point this out below where

                relevant.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Also, thanks for taking the time to expand. It help

                clarify the discussion<span

                  class="Apple-converted-space"> </span><br>

                > quite a bit.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I think we may be converging to an understanding of

                what you want versus<span class="Apple-converted-space"> </span><br>

                > what I want, and I think that there are some points

                - possibly unrelated to<span

                  class="Apple-converted-space"> </span><br>

                > division - that are worth clarifying. I think that

                the main difference is<span

                  class="Apple-converted-space"> </span><br>

                > that when I say "patchpoint", I am referring to a

                concrete intrinsic with<span

                  class="Apple-converted-space"> </span><br>

                > specific semantics that cannot change without

                breaking WebKit, while you are<span

                  class="Apple-converted-space"> </span><br>

                > using the term to refer to a broad concept, or

                rather, a class of<span class="Apple-converted-space"> </span><br>

                > as-yet-unimplemented intrinsics that share some of

                the same features with<span

                  class="Apple-converted-space"> </span><br>

                > patchpoints but otherwise have incompatible

                semantics.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Also, when I say that you wouldn't want to use the

                existing patchpoint to do<span

                  class="Apple-converted-space"> </span><br>

                > your trapping deopt, what I mean is that the

                performance of doing this would<span

                  class="Apple-converted-space"> </span><br>

                > suck for reasons that are not related to

                deoptimization or trapping. I'm<span

                  class="Apple-converted-space"> </span><br>

                > not claiming that deoptimization performs poorly

                (trust me, I know better)<span

                  class="Apple-converted-space"> </span><br>

                > or that trapping to deoptimize is bad (I've done

                this many, many times and I<span

                  class="Apple-converted-space"> </span><br>

                > know better). I'm saying that with the current

                patchpoint intrinsics in<span

                  class="Apple-converted-space"> </span><br>

                > LLVM, as they are currently specified and

                implemented, doing it would be a<span

                  class="Apple-converted-space"> </span><br>

                > bad idea because you'd have to compromise a bunch

                of other optimizations to<span

                  class="Apple-converted-space"> </span><br>

                > achieve it.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > You have essentially described new intrinsics that

                would make this less of a<span

                  class="Apple-converted-space"> </span><br>

                > bad idea and I am interested in your intrinsics, so

                I'll try to both respond<span

                  class="Apple-converted-space"> </span><br>

                > with why patchpoints don't currently give you what

                you want (and why simply<span

                  class="Apple-converted-space"> </span><br>

                > changing patchpoint semantics would be evil) and

                I'll also try to comment on<span

                  class="Apple-converted-space"> </span><br>

                > what I think of the intrinsic that you're

                effectively proposing. Long story<span

                  class="Apple-converted-space"> </span><br>

                > short, I think you should formally propose your

                intrinsic even if it's not<span

                  class="Apple-converted-space"> </span><br>

                > completely fleshed out. I think that it's an

                interesting capability and in<span

                  class="Apple-converted-space"> </span><br>

                > its most basic form, it is a simple variation of

                the current<span class="Apple-converted-space"> </span><br>

                > patchpoint/stackmap intrinsics.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > On April 29, 2014 at 10:09:49 AM, Philip Reames

                (<a class="moz-txt-link-abbreviated" href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)<span

                  class="Apple-converted-space"> </span><br>

                > wrote:<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > As the discussion has progressed and I've spent

                more time thinking about the<span

                  class="Apple-converted-space"> </span><br>

                > topic, I find myself less and less enthused about

                the current proposal. I'm<span

                  class="Apple-converted-space"> </span><br>

                > in full support of having idiomatic ways to express

                safe division, but I'm<span

                  class="Apple-converted-space"> </span><br>

                > starting to doubt that using an intrinsic is the

                right way at the moment.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > One case I find myself thinking about is how one

                would combine profiling<span

                  class="Apple-converted-space"> </span><br>

                > information and implicit div-by-zero/overflow

                checks with this proposal. I<span

                  class="Apple-converted-space"> </span><br>

                > don't really see a clean way. Ideally, for a "safe

                div" which never has the<span

                  class="Apple-converted-space"> </span><br>

                > exceptional paths taken, you'd like to completely

                do away with the control<span

                  class="Apple-converted-space"> </span><br>

                > flow entirely. (And rely on hardware traps

                w/exceptions instead.) I don't<span

                  class="Apple-converted-space"> </span><br>

                > really see a way to represent that type of

                construct given the current<span

                  class="Apple-converted-space"> </span><br>

                > proposal.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > This is a deeper problem and to solve it you'd need

                a solution to trapping<span

                  class="Apple-converted-space"> </span><br>

                > in general. Let's consider the case of Java. A Java

                program may want to<span class="Apple-converted-space"> </span><br>

                > catch the arithmetic exception due to divide by

                zero. How would you do this<span

                  class="Apple-converted-space"> </span><br>

                > with a trap in LLVM IR? Spill all state that is

                live at the catch? Use a<span

                  class="Apple-converted-space"> </span><br>

                > patchpoint for the entire division instruction?<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > We'd likely use something similar to a patchpoint.

                You'd need the "abstract<span

                  class="Apple-converted-space"> </span><br>

                > vm state" (which is not the compiled frame

                necessarily) available at the div<span

                  class="Apple-converted-space"> </span><br>

                > instruction. You could then re-enter the

                interpreter at the specified index<span

                  class="Apple-converted-space"> </span><br>

                > (part of the vm state). We have all most of these

                mechanisms in place.<span class="Apple-converted-space"> </span><br>

                > Ideally, you'd trigger a recompile and otherwise

                ensure re-entry into<span class="Apple-converted-space"> </span><br>

                > compiled code at the soonest possible moment.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > This requires a lot of runtime support, but we

                already have most of it<span

                  class="Apple-converted-space"> </span><br>

                > implemented for another compiler. From our

                perspective, the runtime<span

                  class="Apple-converted-space"> </span><br>

                > requirements are not a major blocker.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Right, you'll use a patchpoint. That's way more

                expensive than using a safe<span

                  class="Apple-converted-space"> </span><br>

                > division intrinsic with branches, because it's

                opaque to the optimizer.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > This statement is true at the moment, but it

                shouldn't be. I think this is<span

                  class="Apple-converted-space"> </span><br>

                > our fundamental difference in approach.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > You should be able to write something like:<span

                  class="Apple-converted-space"> </span><br>

                > i32 %res = invoke patchpoint (...

                x86_trapping_divide, a, b) normal_dest<span

                  class="Apple-converted-space"> </span><br>

                > invoke_dest<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > normal_dest:<span class="Apple-converted-space"> </span><br>

                > ;; use %res<span class="Apple-converted-space"> </span><br>

                > invoke_dest:<span class="Apple-converted-space"> </span><br>

                > landingpad<span class="Apple-converted-space"> </span><br>

                > ;; dispatch edge cases<span

                  class="Apple-converted-space"> </span><br>

                > ;; this could be unreachable code if you deopt this

                frame in the trap<span class="Apple-converted-space"> </span><br>

                > handler and jump directly to an interpreter or

                other bit of code<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I see. It sounds like you want a generalization of

                the "div.with.stackmap"<span

                  class="Apple-converted-space"> </span><br>

                > that I thought you wanted - you want to be able to

                wrap anything in a<span class="Apple-converted-space"> </span><br>

                > stackmap.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > The current patchpoint intrinsic does not do this,

                and you run the risk of<span

                  class="Apple-converted-space"> </span><br>

                > breaking existing semantics if you changed this.

                You'd probably break<span class="Apple-converted-space"> </span><br>

                > WebKit, which treats the call target of the

                patchpoint as nothing more than<span

                  class="Apple-converted-space"> </span><br>

                > a quirk - we always pass null. Also, the current

                patchpoint treats the<span class="Apple-converted-space"> </span><br>

                > callee as an i8* if I remember right and it would

                be super weird if all LLVM<span

                  class="Apple-converted-space"> </span><br>

                > phases had to interpret this i8* by unwrapping a

                possible bitcast to get to<span

                  class="Apple-converted-space"> </span><br>

                > a declared function that may be an intrinsic. Yuck!

                Basically, the call<span class="Apple-converted-space"> </span><br>

                > target of existing patchpoints is meant to be a

                kind of convenience feature<span

                  class="Apple-converted-space"> </span><br>

                > rather than the core of the mechanism.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I agree in principle that the intrinsic that you

                want would be a useful<span

                  class="Apple-converted-space"> </span><br>

                > intrinsic. But let's not call it a patchpoint for

                the purposes of this<span class="Apple-converted-space"> </span><br>

                > discussion, and let's not confuse the discussion by

                claiming (incorrectly)<span

                  class="Apple-converted-space"> </span><br>

                > that the existing patchpoint facility gives you

                what you want. It doesn't:<span

                  class="Apple-converted-space"> </span><br>

                > patchpoints are designed to make the call target

                opaque (hence the use of<span

                  class="Apple-converted-space"> </span><br>

                > i8*) and there shouldn't be a correlation between

                what the patchpoint does<span

                  class="Apple-converted-space"> </span><br>

                > at run-time and what the called function would have

                done. You could make<span class="Apple-converted-space"> </span><br>

                > the call target be null (like WebKit does) and the

                patchpoint should still<span

                  class="Apple-converted-space"> </span><br>

                > mean "this code can do anything" because the

                expectation is that the client<span

                  class="Apple-converted-space"> </span><br>

                > JIT will patch over it anyway.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Also, "patchpoint" would probably not be the right

                term for the intrinsic<span

                  class="Apple-converted-space"> </span><br>

                > that you want. I think that what you want is

                "call.with.stackmap". Or<span

                  class="Apple-converted-space"> </span><br>

                > maybe "stackmap.wrapper". Or just "stackmap" - I'd

                be OK, in principle,<span class="Apple-converted-space"> </span><br>

                > with changing the name of the current "stackmap"

                intrinsic to something that<span

                  class="Apple-converted-space"> </span><br>

                > reflects the fact that it's less of a stackmap than

                what you want.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > To summarize. A patchpoint's main purpose is that

                you can patch it with<span class="Apple-converted-space"> </span><br>

                > arbitrary code. The current "stackmap" means that

                you can patch it with<span class="Apple-converted-space"> </span><br>

                > arbitrary code and that patching may be destructive

                to a shadow of machine<span

                  class="Apple-converted-space"> </span><br>

                > code bytes, so it's really just like patchpoints -

                we could change its name<span

                  class="Apple-converted-space"> </span><br>

                > to "patchpoint.shadow" for example.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > If you were to propose such a stackmap intrinsic,

                then I think there could<span

                  class="Apple-converted-space"> </span><br>

                > be some ways of doing it that wouldn't be too

                terrible. Basically you want<span

                  class="Apple-converted-space"> </span><br>

                > something that is like a patchpoint in that it

                reports a stackmap via a side<span

                  class="Apple-converted-space"> </span><br>

                > channel, but unlike patchpoints, it doesn't allow

                arbitrary patching -<span class="Apple-converted-space"> </span><br>

                > instead the optimizer should be allowed to assume

                that the code within the<span

                  class="Apple-converted-space"> </span><br>

                > patchpoint will always do the same thing that the

                call target would have<span

                  class="Apple-converted-space"> </span><br>

                > done. There are downsides to truly doing this. For

                example, to make<span class="Apple-converted-space"> </span><br>

                > division efficient with such an intrinsic, you'd

                have to do something that<span

                  class="Apple-converted-space"> </span><br>

                > is somewhat worse than just recognizing intrinsics

                in the optimizer - you'd<span

                  class="Apple-converted-space"> </span><br>

                > have to first recognize a call to your "stackmap

                wrapper" intrinsic and then<span

                  class="Apple-converted-space"> </span><br>

                > observe that its call target argument is in turn

                another intrinsic. To me<span

                  class="Apple-converted-space"> </span><br>

                > personally, that's kind of yucky, but I won't deny

                that it could be useful.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > As to the use of invoke: I don't believe that the

                use of invoke versus my<span

                  class="Apple-converted-space"> </span><br>

                > suggested "branch on a trap predicate" idea are

                different in any truly<span

                  class="Apple-converted-space"> </span><br>

                > meaningful way. I buy that either would work.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > A patchpoint should not require any excess

                spilling. If values are live in<span

                  class="Apple-converted-space"> </span><br>

                > registers, that should be reflected in the stack

                map. (I do not know if<span

                  class="Apple-converted-space"> </span><br>

                > this is the case for patchpoint at the moment or

                not.)<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Patchpoints do not require spilling.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > My point was that with existing patchpoints, you

                *either* use a patchpoint<span

                  class="Apple-converted-space"> </span><br>

                > for the entire division which makes the division

                opaque to the optimizer -<span

                  class="Apple-converted-space"> </span><br>

                > because a patchpoint means "this code can do

                anything" - *or* you could<span

                  class="Apple-converted-space"> </span><br>

                > spill stuff to the stack prior to your trapping

                division intrinsic, since<span

                  class="Apple-converted-space"> </span><br>

                > spilling is something that you could do as a

                workaround if you didn't have a<span

                  class="Apple-converted-space"> </span><br>

                > patchpoint.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > The reason why I brought up spilling at all is that

                I suspect that spilling<span

                  class="Apple-converted-space"> </span><br>

                > all state to the stack might be cheaper - for some

                systems - than turning<span

                  class="Apple-converted-space"> </span><br>

                > the division into a patchpoint. Turning the

                division into a patchpoint is<span

                  class="Apple-converted-space"> </span><br>

                > horrendously brutal - the patchpoint looks like it

                clobbers the heap (which<span

                  class="Apple-converted-space"> </span><br>

                > a division doesn't do), has to execute (a division

                is an obvious DCE<span class="Apple-converted-space"> </span><br>

                > candidate), cannot be hoisted (hoisting divisions

                is awesome), etc. Perhaps<span

                  class="Apple-converted-space"> </span><br>

                > most importantly, though, a patchpoint doesn't tell

                LLVM that you're *doing<span

                  class="Apple-converted-space"> </span><br>

                > a division* - so all constant folding, strenght

                reduction, and algebraic<span

                  class="Apple-converted-space"> </span><br>

                > reasoning flies out the window. On the other hand,

                spilling all state to<span class="Apple-converted-space"> </span><br>

                > the stack is an arguably sound and performant

                solution to a lot of VM<span

                  class="Apple-converted-space"> </span><br>

                > problems. I've seen JVM implementations that ensure

                that there is always a<span

                  class="Apple-converted-space"> </span><br>

                > copy of state on the stack at some critical points,

                just because it makes<span class="Apple-converted-space"> </span><br>

                > loads of stuff simpler (debugging, profiling, GC,

                and of course deopt). I<span

                  class="Apple-converted-space"> </span><br>

                > personally prefer to stay away from such a strategy

                because it's not free.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > On the other hand, branches can be cheap. A branch

                on a divide is cheaper<span

                  class="Apple-converted-space"> </span><br>

                > than not being able to optimize the divide.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > The Value called by a patchpoint should participate

                in optimization<span class="Apple-converted-space"> </span><br>

                > normally.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I agree that you could have a different intrinsic

                that behaves like this.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > We really want the patchpoint part of the call to

                be supplemental. It<span class="Apple-converted-space"> </span><br>

                > should still be a call. It should be constant

                propagated, transformed,<span

                  class="Apple-converted-space"> </span><br>

                > etc.. This is not the case currently. I've got a

                couple of off the wall<span

                  class="Apple-converted-space"> </span><br>

                > ideas for improving the current status, but I'll

                agree this is a hardish<span

                  class="Apple-converted-space"> </span><br>

                > problem.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > It should be legal to use a patchpoint in an

                invoke. It's an ABI issue of<span

                  class="Apple-converted-space"> </span><br>

                > how the invoke path gets invoked. (i.e. side tables

                for the runtime to<span class="Apple-converted-space"> </span><br>

                > lookup, etc..) This is not possible today, and

                probably requires a fair<span

                  class="Apple-converted-space"> </span><br>

                > amount of work. Some of it, I've already done and

                will be sharing shortly.<span

                  class="Apple-converted-space"> </span><br>

                > Other parts, I haven't even thought about.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Right, it's significantly more complex than either

                the existing patchpoints<span

                  class="Apple-converted-space"> </span><br>

                > or Michael's proposed safe.div.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > If you didn't want to use the trapping semantics,

                you'd insert dedicated<span

                  class="Apple-converted-space"> </span><br>

                > control flow _before_ the divide. This would allow

                normal optimization of<span

                  class="Apple-converted-space"> </span><br>

                > the control flow.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Notes:<span class="Apple-converted-space"> </span><br>

                > 1) This might require a new PATCHPOINT pseudo op in

                the backend. Haven't<span class="Apple-converted-space"> </span><br>

                > thought much about that yet.<span

                  class="Apple-converted-space"> </span><br>

                > 2) I *think* your current intrinsic could be

                translated into something like<span

                  class="Apple-converted-space"> </span><br>

                > this. (Leaving aside the question of where the

                deopt state comes from.) In<span

                  class="Apple-converted-space"> </span><br>

                > fact, the more I look at this, the less difference

                I actually see between<span

                  class="Apple-converted-space"> </span><br>

                > the approaches.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > In a lot of languages, a divide produces some

                result even in the exceptional<span

                  class="Apple-converted-space"> </span><br>

                > case and this result requires effectively

                deoptimizing since the resut won't<span

                  class="Apple-converted-space"> </span><br>

                > be the one you would have predicted (double instead

                of int, or BigInt<span class="Apple-converted-space"> </span><br>

                > instead of small int), which sort of means that if

                the CPU exception occurs<span

                  class="Apple-converted-space"> </span><br>

                > you have to be able to reconstruct all state. A

                patchpoint could do this,<span

                  class="Apple-converted-space"> </span><br>

                > and so could spilling all state to the stack before

                the divide - but both<span class="Apple-converted-space"> </span><br>

                > are very heavy hammers that are sure to be more

                expensive than just doing a<span

                  class="Apple-converted-space"> </span><br>

                > branch.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > This isn't necessarily as expensive as you might

                believe. I'd recommend<span

                  class="Apple-converted-space"> </span><br>

                > reading the Graal project papers on this topic.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Whether deopt or branching is more profitable *in

                this case*, I can't easily<span

                  class="Apple-converted-space"> </span><br>

                > say. I'm not yet to the point of being able to run

                that experiment. We can<span

                  class="Apple-converted-space"> </span><br>

                > argue about what "should" be better all we want,

                but real performance data<span

                  class="Apple-converted-space"> </span><br>

                > is the only way to truly know.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > My point may have been confusing. I know that

                deoptimization is cheap and<span

                  class="Apple-converted-space"> </span><br>

                > WebKit uses it everywhere, including division

                corner cases, if profiling<span

                  class="Apple-converted-space"> </span><br>

                > tells us that it's profitable to do so (which it

                does, in the common case).<span

                  class="Apple-converted-space"> </span><br>

                > WebKit is a heavy user of deoptimization in

                general, so you don't need to<span

                  class="Apple-converted-space"> </span><br>

                > convince me that it's worth it.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Acknowledged.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Note that I want *both* deopt *and* branching,

                because in this case, a<span

                  class="Apple-converted-space"> </span><br>

                > branch is the fastest overall way of detecting when

                to deopt. In the<span class="Apple-converted-space"> </span><br>

                > future, I will want to implement the deopt in terms

                of branching, and when<span

                  class="Apple-converted-space"> </span><br>

                > we do this, I believe that the most sound and

                performat approach would be<span

                  class="Apple-converted-space"> </span><br>

                > using Michael's intrinsics. This is subtle and I'll

                try to explain why it's<span

                  class="Apple-converted-space"> </span><br>

                > the case.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > The point is that you wouldn't want to do

                deoptimization by spilling state<span

                  class="Apple-converted-space"> </span><br>

                > on the main path or by using a patchpoint for the

                main path of the division.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > This is the main point I disagree with. I don't

                believe that having a<span class="Apple-converted-space"> </span><br>

                > patchpoint on the main path should be any more

                expensive then the original<span

                  class="Apple-converted-space"> </span><br>

                > call. (see above)<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > The reason why the patchpoint is expensive is that

                if you use a patchpoint<span

                  class="Apple-converted-space"> </span><br>

                > to implement a division then the optimizer won't be

                allowed to assume that<span

                  class="Apple-converted-space"> </span><br>

                > it's a division, because the whole point of

                "patchpoint" is to tell the<span

                  class="Apple-converted-space"> </span><br>

                > optimizer to piss off.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Worth noting explicitly: I'm assuming that all of

                your deopt state would<span

                  class="Apple-converted-space"> </span><br>

                > already be available for other purposes in nearby

                code. It's on the stack<span

                  class="Apple-converted-space"> </span><br>

                > or in registers. I'm assuming that by adding the

                deopt point, you are not<span

                  class="Apple-converted-space"> </span><br>

                > radically changing the set of computations which

                need to be done. If that's<span

                  class="Apple-converted-space"> </span><br>

                > not the case, you should avoid deopt and instead

                just inline the slow paths<span

                  class="Apple-converted-space"> </span><br>

                > with explicit checks.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Yes, of course it is. That's not the issue.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I'll note that given your assumptions about the

                cost of a patchpoint, the<span

                  class="Apple-converted-space"> </span><br>

                > rest of your position makes a lot more sense. :) As

                I spelled out above, I<span

                  class="Apple-converted-space"> </span><br>

                > believe this cost is not fundamental.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > You don't want the common path of executing the

                division to involve a<span class="Apple-converted-space"> </span><br>

                > patchpoint instruction, although using a patchpoint

                or stackmap to implement<span

                  class="Apple-converted-space"> </span><br>

                > deoptimization on the failing path is great:<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Good: if (division would fail) { call

                @patchpoint(all of my state) } else {<span

                  class="Apple-converted-space"> </span><br>

                > result = a / b }<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Given your cost assumptions, I'd agree.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Not my cost assumptions. The reason why this is

                better is that the division<span

                  class="Apple-converted-space"> </span><br>

                > is expressed in LLVM IR so that LLVM can do useful

                things to it - like<span class="Apple-converted-space"> </span><br>

                > eliminate it, for example.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Bad: call @patchpoint(all of my state) // patch

                with a divide instruction -<span

                  class="Apple-converted-space"> </span><br>

                > bad because the optimizer has no clue what you're

                doing and assumes the very<span

                  class="Apple-converted-space"> </span><br>

                > worst<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Yuck. Agreed.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > To be clear, this is what you're proposing - except

                that you're assuming<span class="Apple-converted-space"> </span><br>

                > that LLVM will know that you've patched a division

                because you're expecting<span

                  class="Apple-converted-space"> </span><br>

                > the call target to have semantic meaning. Or,

                rather, you're expecting that<span

                  class="Apple-converted-space"> </span><br>

                > you can make the contents of the patchpoint be a

                division by having the call<span

                  class="Apple-converted-space"> </span><br>

                > target be a division intrinsic. In the current

                implementation and as it is<span

                  class="Apple-converted-space"> </span><br>

                > currently specified, the call target has no meaning

                and so you get the yuck<span

                  class="Apple-converted-space"> </span><br>

                > that I'm showing.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Worse: spill all state to the stack; call

                @trapping.div(a, b) // the spills<span

                  class="Apple-converted-space"> </span><br>

                > will hurt you far more than a branch, so this

                should be avoided<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I'm assuming this is an explicit spill rather than

                simply recording a stack<span

                  class="Apple-converted-space"> </span><br>

                > map *at the div*. If so, agreed.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I suppose we could imagine a fourth option that

                involves a patchpoint to<span

                  class="Apple-converted-space"> </span><br>

                > pick up the state and a trapping divide instrinsic.

                But a trapping divide<span class="Apple-converted-space"> </span><br>

                > intrinsic alone is not enough. Consider this:<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > result = call @trapping.div(a, b); call

                @stackmap(all of my state)<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > As soon as these are separate instructions, you

                have no guarantees that the<span

                  class="Apple-converted-space"> </span><br>

                > state that the stackmap reports is sound for the

                point at which the div<span

                  class="Apple-converted-space"> </span><br>

                > would trap.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > This is the closest to what I'd propose, except

                that the two calls would be<span

                  class="Apple-converted-space"> </span><br>

                > merged into a single patchpoint. Isn't the entire

                point of a patchpoint to<span

                  class="Apple-converted-space"> </span><br>

                > record the stack map for a call?<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > No. It would be bad if that's what the

                documentation says. That's not at<span

                  class="Apple-converted-space"> </span><br>

                > all how WebKit uses it or probably any IC client

                would use it.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Patchpoints are designed to be inline assembly on

                steroids. They're there<span

                  class="Apple-converted-space"> </span><br>

                > to allow the client JIT to tell LLVM to piss off.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > (Well, ignoring the actual patching part..) Why not

                write this as:<span class="Apple-converted-space"> </span><br>

                > patchpoint(..., trapping.div, a, b);<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Is there something I'm missing here?<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Just to note: I fully agree that the two call

                proposal is unsound and should<span

                  class="Apple-converted-space"> </span><br>

                > be avoided.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > So, the division itself shouldn't be a trapping

                instruction and instead you<span

                  class="Apple-converted-space"> </span><br>

                > want to detect the bad case with a branch.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > To be clear:<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > - Whether you use deoptimization for division or

                anything else - like WebKit<span

                  class="Apple-converted-space"> </span><br>

                > has done since before any of the Graal papers were

                written - is mostly<span class="Apple-converted-space"> </span><br>

                > unrelated to how you represent the division, unless

                you wanted to add a new<span

                  class="Apple-converted-space"> </span><br>

                > intrinsic that is like a

                trapping-division-with-stackmap:<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > result = call @trapping.div.with.stackmap(a, b, ...

                all of my state ...)<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Now, maybe you do want such an intrinsic, in which

                case you should propose<span

                  class="Apple-converted-space"> </span><br>

                > it!<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Given what we already have with patchpoints, I

                don't think a merged<span class="Apple-converted-space"> </span><br>

                > intrinsic is necessary. (See above). I believe we

                have all the parts to<span class="Apple-converted-space"> </span><br>

                > build this solution, and that - in theory - they

                should compose neatly.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > p.s. The bits I was referring to was not deopt per

                se. It was particularly<span

                  class="Apple-converted-space"> </span><br>

                > which set of deopt state you used for each deopt

                point. That's a bit of<span

                  class="Apple-converted-space"> </span><br>

                > tangent for the rest of the discussion now though.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > The reason why I haven't proposed it is that I

                think that long-term, the<span

                  class="Apple-converted-space"> </span><br>

                > currently proposed intrinsics are a better path to

                getting the trapping<span class="Apple-converted-space"> </span><br>

                > optimizations. See my previous mail, where I show

                how we could tell LLVM<span

                  class="Apple-converted-space"> </span><br>

                > what the failing path is (which may have

                deoptimization code that uses a<span

                  class="Apple-converted-space"> </span><br>

                > stackmap or whatever), what the trapping predicate

                is (it comes from the<span class="Apple-converted-space"> </span><br>

                > safe.div intrinsic), and the fact that trapping is

                wise (branch weights).<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > - If you want to do the deoptimization with a trap,

                then your only choice<span class="Apple-converted-space"> </span><br>

                > currently is to use a patchpoint for the main path

                of the division. This<span class="Apple-converted-space"> </span><br>

                > will be slower than using a branch to an OSR exit

                basic block, because<span class="Apple-converted-space"> </span><br>

                > you're making the division itself opaque to the

                optimizer (bad) just to get<span

                  class="Apple-converted-space"> </span><br>

                > rid of a branch (which was probably cheap to begin

                with).<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Hence, what you want to do - one way or another,

                regardless of whether this<span

                  class="Apple-converted-space"> </span><br>

                > proposed intrinsic is added - is to branch on the

                corner case condition, and<span

                  class="Apple-converted-space"> </span><br>

                > have the slow case of the branch go to a basic

                block that deoptimizes.<span

                  class="Apple-converted-space"> </span><br>

                > Unless of course you have profiling that says that

                the case does happen<span class="Apple-converted-space"> </span><br>

                > often, in which case you can have that basic block

                handle the corner case<span

                  class="Apple-converted-space"> </span><br>

                > inline without leaving optimized code (FWIW, we do

                have such paths in WebKit<span

                  class="Apple-converted-space"> </span><br>

                > and they are useful).<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > So the question for me is whether the branching

                involves explicit control<span

                  class="Apple-converted-space"> </span><br>

                > flow or is hidden inside an intrinsic. I prefer for

                it to be within an<span class="Apple-converted-space"> </span><br>

                > intrinsic because it:<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > - allows the optimizer to do more interesting

                things in the common cases,<span

                  class="Apple-converted-space"> </span><br>

                > like hoisting the entire division.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > - will give us a clearer path for implementing

                trapping optimizations in the<span

                  class="Apple-converted-space"> </span><br>

                > future.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > - is an immediate win on ARM.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I'd be curious to hear what specific idea you have

                about how to implement<span

                  class="Apple-converted-space"> </span><br>

                > trap-based deoptimization with your trapping

                division intrinsic for x86 -<span

                  class="Apple-converted-space"> </span><br>

                > maybe it's different from the two "bad" idioms I

                showed above.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I hope my explanation above helps. If not, ask, and

                I'll try to explain<span class="Apple-converted-space"> </span><br>

                > more clearly.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I think I understand it. I think that the only

                issue is that:<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > - Patchpoints currently don't do what you want.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > - If you made patchpoints do what you want then

                you'd break WebKit - not to<span

                  class="Apple-converted-space"> </span><br>

                > mention anyone who wants to use them for inline

                caches.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > So it seems like you want a new intrinsic. You

                should officially propose<span

                  class="Apple-converted-space"> </span><br>

                > this new intrinsic, particularly since the core

                semantic differences are not<span

                  class="Apple-converted-space"> </span><br>

                > so great from what we have now. OTOH, if you truly

                believe that patchpoints<span

                  class="Apple-converted-space"> </span><br>

                > should just be changed to your semantics in a way

                that does break WebKit,<span

                  class="Apple-converted-space"> </span><br>

                > then that's probably also something you should get

                off your chest. ;-)<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > One point just for clarity; I don't believe this

                effects the conclusions of<span

                  class="Apple-converted-space"> </span><br>

                > our discussion so far. I'm also fairly sure that

                you (Filip) are aware of<span

                  class="Apple-converted-space"> </span><br>

                > this, but want to spell it out for other readers.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > You seem to be assuming that compiled code needs to

                explicitly branch to a<span

                  class="Apple-converted-space"> </span><br>

                > point where deopt state is known to exit a compiled

                frame.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > This is a slightly unclear characterization of my

                assumptions. Our JIT does<span

                  class="Apple-converted-space"> </span><br>

                > deoptimization without explicit branches for many,

                many things. You should<span

                  class="Apple-converted-space"> </span><br>

                > look at it some time, it's pretty fancy. ;-)<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Worth noting is that you can also exit a compiled

                frame on a trap (without<span

                  class="Apple-converted-space"> </span><br>

                > an explicitly condition/branch!) if the deopt state

                is known at the point<span class="Apple-converted-space"> </span><br>

                > you take the trap. This "exit frame on trap"

                behavior shows up with null<span

                  class="Apple-converted-space"> </span><br>

                > pointer exceptions as well. I'll note that both

                compilers in OpenJDK<span class="Apple-converted-space"> </span><br>

                > support some combination of "exit-on-trap"

                conditions for division and null<span

                  class="Apple-converted-space"> </span><br>

                > dereferences. The two differ on exactly what

                strategies they use in each<span

                  class="Apple-converted-space"> </span><br>

                > case though. :)<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Yeah, and I've also implemented VMs that do this -

                and I endorse the basic<span

                  class="Apple-converted-space"> </span><br>

                > idea. I know what you want, and my only point is

                that the existing<span class="Apple-converted-space"> </span><br>

                > patchpoints only give you this if you're willing to

                make a huge compromise:<span

                  class="Apple-converted-space"> </span><br>

                > namely, that you're willing to make the division

                (or heap load for the null<span

                  class="Apple-converted-space"> </span><br>

                > case) completely opaque to the compiler to the

                point that GVN, LICM, SCCP,<span

                  class="Apple-converted-space"> </span><br>

                > and all algebraic reasoning have to give up on

                optimizing it. The point of<span

                  class="Apple-converted-space"> </span><br>

                > using LLVM is that it can optimize code. It can

                optimize branches and<span class="Apple-converted-space"> </span><br>

                > divisions pretty well. So, eliminating an explicit

                branch by replacing it<span

                  class="Apple-converted-space"> </span><br>

                > with a construct that appears opaque to the

                optimizer is not a smart<span

                  class="Apple-converted-space"> </span><br>

                > trade-off.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > You could add a new intrinsic that, like

                patchpoints, records the layout of<span

                  class="Apple-converted-space"> </span><br>

                > state in a side-table, but that is used as a kind

                of wrapper for operations<span

                  class="Apple-converted-space"> </span><br>

                > that LLVM understands. This may or may not be hairy

                - you seem to have sort<span

                  class="Apple-converted-space"> </span><br>

                > of acknowledged that it's got some complexity and

                I've also pointed out some<span

                  class="Apple-converted-space"> </span><br>

                > possible issues. If this is something that you

                want, you should propose it<span

                  class="Apple-converted-space"> </span><br>

                > so that others know what you're talking about. One

                danger of how we're<span class="Apple-converted-space"> </span><br>

                > discussing this right now is that you're

                overloading patchpoints to mean the<span

                  class="Apple-converted-space"> </span><br>

                > thing you want them to mean rather than what they

                actually mean, which makes<span

                  class="Apple-converted-space"> </span><br>

                > it seem like we don't need Michael's intrinsics on

                the grounds that<span class="Apple-converted-space"> </span><br>

                > patchpoints already offer a solution. They don't

                already offer a solution<span

                  class="Apple-converted-space"> </span><br>

                > precisely because patchpoints don't do what your

                intrinsics would do.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I'm not really arguing that either scheme is

                "better" in all cases. I'm<span

                  class="Apple-converted-space"> </span><br>

                > simply arguing that we should support both and

                allow optimization and tuning<span

                  class="Apple-converted-space"> </span><br>

                > between them. As far as I can tell, you seem to be

                assuming that an<span class="Apple-converted-space"> </span><br>

                > explicit branch to known exit point is always

                superior.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Ok, back to the topic at hand...<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > With regards to the current proposal, I'm going to

                take a step back. You<span class="Apple-converted-space"> </span><br>

                > guys seem to have already looked in this in a fair

                amount of depth. I'm not<span

                  class="Apple-converted-space"> </span><br>

                > necessarily convinced you've come to the best

                solution, but at some point,<span

                  class="Apple-converted-space"> </span><br>

                > we need to make forward progress. What you have is

                clearly better than<span class="Apple-converted-space"> </span><br>

                > nothing.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Please go ahead and submit your current approach.

                We can come back and<span class="Apple-converted-space"> </span><br>

                > revise later if we really need to.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I do request the following changes:<span

                  class="Apple-converted-space"> </span><br>

                > - Mark it clearly as experimental.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > - Either don't specify the value computed in the

                edge cases, or allow those<span

                  class="Apple-converted-space"> </span><br>

                > values to be specified as constant arguments to the

                call. This allows<span class="Apple-converted-space"> </span><br>

                > efficient lowering to x86's div instruction if you

                want to make use of the<span

                  class="Apple-converted-space"> </span><br>

                > trapping semantics.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Once again: how would you use this to get trapping

                semantics without<span class="Apple-converted-space"> </span><br>

                > throwing all of LLVM's optimizations out the

                window, in the absence of the<span

                  class="Apple-converted-space"> </span><br>

                > kind of patchpoint-like intrinsic that you want? I

                ask just to make sure<span class="Apple-converted-space"> </span><br>

                > that we're on the same page.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Finally, as for performance data, which part of

                this do you want performance<span

                  class="Apple-converted-space"> </span><br>

                > data for? I concede that I don't have performance

                data for using Michael's<span

                  class="Apple-converted-space"> </span><br>

                > new intrinsic. Part of what the intrinsic

                accomplishes is it gives a less<span

                  class="Apple-converted-space"> </span><br>

                > ugly way of doing something that is already

                possible with target intrinsics<span

                  class="Apple-converted-space"> </span><br>

                > on ARM. I think it would be great if you could get

                those semantics - along<span

                  class="Apple-converted-space"> </span><br>

                > with a known-good implementation - on other

                architectures as well.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I would be very interested in seeing data comparing

                two schemes:<span class="Apple-converted-space"> </span><br>

                > - Explicit control flow emited by the frontend<span

                  class="Apple-converted-space"> </span><br>

                > - The safe.div intrinsic emitted by the frontend,

                desugared in CodeGenPrep<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > My strong suspicion is that each would preform well

                in some cases and not in<span

                  class="Apple-converted-space"> </span><br>

                > others. At least on x86. Since the edge-checks are

                essentially free on<span class="Apple-converted-space"> </span><br>

                > ARM, the second scheme would probably be strictly

                superior there.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > I am NOT asking that we block submission on this

                data however.<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > But this discussion has also involved suggestions

                that we should use<span class="Apple-converted-space"> </span><br>

                > trapping to implement deoptimization, and the main

                point of my message is to<span

                  class="Apple-converted-space"> </span><br>

                > strongly argue against anything like this given the

                current state of<span class="Apple-converted-space"> </span><br>

                > trapping semantics and how patchpoints work. My

                point is that using traps<span

                  class="Apple-converted-space"> </span><br>

                > for division corner cases would either be unsound

                (see the stackmap after<span

                  class="Apple-converted-space"> </span><br>

                > the trap, above), or would require you to do things

                that are obviously<span class="Apple-converted-space"> </span><br>

                > inefficient. If you truly believe that the branch

                to detect division slow<span

                  class="Apple-converted-space"> </span><br>

                > paths is more expensive than spilling all bytecode

                state to the stack or<span class="Apple-converted-space"> </span><br>

                > using a patchpoint for the division, then I could

                probably hack something up<span

                  class="Apple-converted-space"> </span><br>

                > in WebKit to show you the performance implications.

                (Or you could do it<span class="Apple-converted-space"> </span><br>

                > yourself, the code is open source...)<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > In a couple of months, I'll probably have the

                performance data to discuss<span

                  class="Apple-converted-space"> </span><br>

                > this for real. When that happens, let's pick this

                up and continue the<span class="Apple-converted-space"> </span><br>

                > debate. Alternatively, if you want to chat this

                over more with a beer in<span

                  class="Apple-converted-space"> </span><br>

                > hand at the social next week, let me know. In the

                meantime, let's not stall<span

                  class="Apple-converted-space"> </span><br>

                > the current proposal any more.<span

                  class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                > Philip<span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

                ><span class="Apple-converted-space"> </span><br>

              </div>

            </div>

          </span></blockquote>

      </div>

    </blockquote>

    <br>

  </body>

</html>