<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br></div><p style="color:#000;">On May 2, 2014 at 11:53:25 AM, Eric Christopher (<a href="mailto:echristo@gmail.com">echristo@gmail.com</a>) wrote:</p> <div><blockquote type="cite" class="clean_bq" style="color: rgb(0, 0, 0); font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"><span><div><div></div><div>On Wed, Apr 30, 2014 at 10:34 PM, Philip Reames<span class="Apple-converted-space"> </span><br><listmail@philipreames.com> wrote:<span class="Apple-converted-space"> </span><br>> Andy - If you're not already following this closely, please start. We've<span class="Apple-converted-space"> </span><br>> gotten into fairly fundamental questions of what a patchpoint does.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Filip,<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I think you've hit the nail on the head. What I'm thinking of as being<span class="Apple-converted-space"> </span><br>> patchpoints are not what you think they are. Part of that is that I've got<span class="Apple-converted-space"> </span><br>> a local change which adds a very similar construction (called "statepoints"<span class="Apple-converted-space"> </span><br>> for the moment), but I was trying to keep that separate. That also includes<span class="Apple-converted-space"> </span><br>> a lot of GC semantics which are not under discussion currently. My<span class="Apple-converted-space"> </span><br>> apologies if that experience bled over into this conversation and made<span class="Apple-converted-space"> </span><br>> things more confusing.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I will note that the documentation for patchpoint say explicitly the<span class="Apple-converted-space"> </span><br>> following:<span class="Apple-converted-space"> </span><br>> "The ‘llvm.experimental.patchpoint.*‘ intrinsics creates a function call to<span class="Apple-converted-space"> </span><br>> the specified <target> and records the location of specified values in the<span class="Apple-converted-space"> </span><br>> stack map."<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> My reading has always been that a patchpoint *that isn't patched* is simply<span class="Apple-converted-space"> </span><br>> a call with a stackmap associated with it. To my reading, this can (and<span class="Apple-converted-space"> </span><br>> did, and does) indicate my proposed usage would be legal.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br><br>I like the idea that the target can be assumed to be called. It makes<span class="Apple-converted-space"> </span><br>optimization of the call possible, etc. I think it's definitely worth<span class="Apple-converted-space"> </span><br>exploring before we lock down the patchpoint intrinsic.<span class="Apple-converted-space"> </span></div></div></span></blockquote></div><p>I will actively oppose this.</p><p>-Filip</p><p><br></p><div><blockquote type="cite" class="clean_bq" style="color: rgb(0, 0, 0); font-family: Helvetica, Arial; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);"><span><div><div><br><br>-eric<span class="Apple-converted-space"> </span><br><br>> I will agree that I've confused the topic badly on the optimization front.<span class="Apple-converted-space"> </span><br>> My "statepoint" isn't patchable, so a lot more optimizations are legal.<span class="Apple-converted-space"> </span><br>> Sorry about that. To restate what I think you've been saying all along, the<span class="Apple-converted-space"> </span><br>> optimizer can't make assumptions about what function is called by a<span class="Apple-converted-space"> </span><br>> patchpoint because that might change based on later patching. Is this the<span class="Apple-converted-space"> </span><br>> key point you've been trying to make?<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I'm not objecting to separating "my patchpoint" from "your patchpoint".<span class="Apple-converted-space"> </span><br>> Let's just hammer out the semantics of each first. :)<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Again, longer response to follow in a day or so. :)<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Philip<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> On 04/30/2014 10:09 PM, Filip Pizlo wrote:<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> On April 30, 2014 at 9:06:20 PM, Philip Reames (listmail@philipreames.com)<span class="Apple-converted-space"> </span><br>> wrote:<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> On 04/29/2014 12:39 PM, Filip Pizlo wrote:<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> On April 29, 2014 at 11:27:06 AM, Philip Reames (listmail@philipreames.com)<span class="Apple-converted-space"> </span><br>> wrote:<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> On 04/29/2014 10:44 AM, Filip Pizlo wrote:<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> LD;DR: Your desire to use trapping on x86 only further convinces me that<span class="Apple-converted-space"> </span><br>> Michael's proposed intrinsics are the best way to go.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I'm still not convinced, but am not going to actively oppose it either. I'm<span class="Apple-converted-space"> </span><br>> leery of designing a solution with major assumptions we don't have data to<span class="Apple-converted-space"> </span><br>> backup.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I worry your assumptions about deoptimization are potentially unsound. But<span class="Apple-converted-space"> </span><br>> I don't have data to actually show this (yet).<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I *think* I may have been unclear about my assumptions; in particular, my<span class="Apple-converted-space"> </span><br>> claims with respect to deoptimization are probably more subtle than they<span class="Apple-converted-space"> </span><br>> appeared. WebKit can use LLVM and it has divisions and we do all possible<span class="Apple-converted-space"> </span><br>> deoptimization/profiling/etc tricks for it, so this is grounded in<span class="Apple-converted-space"> </span><br>> experience. Forgive me if the rest of this e-mail contains a lecture on<span class="Apple-converted-space"> </span><br>> things that are obvious - I'll try to err on the side of clarity and<span class="Apple-converted-space"> </span><br>> completeness since this discussion is sufficiently dense that we run the<span class="Apple-converted-space"> </span><br>> risk of talking cross-purposes unless some baseline assumptions are<span class="Apple-converted-space"> </span><br>> established.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I think we're using the same terminology, but with slightly different sets<span class="Apple-converted-space"> </span><br>> of assumptions. I'll point this out below where relevant.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Also, thanks for taking the time to expand. It help clarify the discussion<span class="Apple-converted-space"> </span><br>> quite a bit.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I think we may be converging to an understanding of what you want versus<span class="Apple-converted-space"> </span><br>> what I want, and I think that there are some points - possibly unrelated to<span class="Apple-converted-space"> </span><br>> division - that are worth clarifying. I think that the main difference is<span class="Apple-converted-space"> </span><br>> that when I say "patchpoint", I am referring to a concrete intrinsic with<span class="Apple-converted-space"> </span><br>> specific semantics that cannot change without breaking WebKit, while you are<span class="Apple-converted-space"> </span><br>> using the term to refer to a broad concept, or rather, a class of<span class="Apple-converted-space"> </span><br>> as-yet-unimplemented intrinsics that share some of the same features with<span class="Apple-converted-space"> </span><br>> patchpoints but otherwise have incompatible semantics.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Also, when I say that you wouldn't want to use the existing patchpoint to do<span class="Apple-converted-space"> </span><br>> your trapping deopt, what I mean is that the performance of doing this would<span class="Apple-converted-space"> </span><br>> suck for reasons that are not related to deoptimization or trapping. I'm<span class="Apple-converted-space"> </span><br>> not claiming that deoptimization performs poorly (trust me, I know better)<span class="Apple-converted-space"> </span><br>> or that trapping to deoptimize is bad (I've done this many, many times and I<span class="Apple-converted-space"> </span><br>> know better). I'm saying that with the current patchpoint intrinsics in<span class="Apple-converted-space"> </span><br>> LLVM, as they are currently specified and implemented, doing it would be a<span class="Apple-converted-space"> </span><br>> bad idea because you'd have to compromise a bunch of other optimizations to<span class="Apple-converted-space"> </span><br>> achieve it.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> You have essentially described new intrinsics that would make this less of a<span class="Apple-converted-space"> </span><br>> bad idea and I am interested in your intrinsics, so I'll try to both respond<span class="Apple-converted-space"> </span><br>> with why patchpoints don't currently give you what you want (and why simply<span class="Apple-converted-space"> </span><br>> changing patchpoint semantics would be evil) and I'll also try to comment on<span class="Apple-converted-space"> </span><br>> what I think of the intrinsic that you're effectively proposing. Long story<span class="Apple-converted-space"> </span><br>> short, I think you should formally propose your intrinsic even if it's not<span class="Apple-converted-space"> </span><br>> completely fleshed out. I think that it's an interesting capability and in<span class="Apple-converted-space"> </span><br>> its most basic form, it is a simple variation of the current<span class="Apple-converted-space"> </span><br>> patchpoint/stackmap intrinsics.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> On April 29, 2014 at 10:09:49 AM, Philip Reames (listmail@philipreames.com)<span class="Apple-converted-space"> </span><br>> wrote:<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> As the discussion has progressed and I've spent more time thinking about the<span class="Apple-converted-space"> </span><br>> topic, I find myself less and less enthused about the current proposal. I'm<span class="Apple-converted-space"> </span><br>> in full support of having idiomatic ways to express safe division, but I'm<span class="Apple-converted-space"> </span><br>> starting to doubt that using an intrinsic is the right way at the moment.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> One case I find myself thinking about is how one would combine profiling<span class="Apple-converted-space"> </span><br>> information and implicit div-by-zero/overflow checks with this proposal. I<span class="Apple-converted-space"> </span><br>> don't really see a clean way. Ideally, for a "safe div" which never has the<span class="Apple-converted-space"> </span><br>> exceptional paths taken, you'd like to completely do away with the control<span class="Apple-converted-space"> </span><br>> flow entirely. (And rely on hardware traps w/exceptions instead.) I don't<span class="Apple-converted-space"> </span><br>> really see a way to represent that type of construct given the current<span class="Apple-converted-space"> </span><br>> proposal.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> This is a deeper problem and to solve it you'd need a solution to trapping<span class="Apple-converted-space"> </span><br>> in general. Let's consider the case of Java. A Java program may want to<span class="Apple-converted-space"> </span><br>> catch the arithmetic exception due to divide by zero. How would you do this<span class="Apple-converted-space"> </span><br>> with a trap in LLVM IR? Spill all state that is live at the catch? Use a<span class="Apple-converted-space"> </span><br>> patchpoint for the entire division instruction?<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> We'd likely use something similar to a patchpoint. You'd need the "abstract<span class="Apple-converted-space"> </span><br>> vm state" (which is not the compiled frame necessarily) available at the div<span class="Apple-converted-space"> </span><br>> instruction. You could then re-enter the interpreter at the specified index<span class="Apple-converted-space"> </span><br>> (part of the vm state). We have all most of these mechanisms in place.<span class="Apple-converted-space"> </span><br>> Ideally, you'd trigger a recompile and otherwise ensure re-entry into<span class="Apple-converted-space"> </span><br>> compiled code at the soonest possible moment.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> This requires a lot of runtime support, but we already have most of it<span class="Apple-converted-space"> </span><br>> implemented for another compiler. From our perspective, the runtime<span class="Apple-converted-space"> </span><br>> requirements are not a major blocker.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Right, you'll use a patchpoint. That's way more expensive than using a safe<span class="Apple-converted-space"> </span><br>> division intrinsic with branches, because it's opaque to the optimizer.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> This statement is true at the moment, but it shouldn't be. I think this is<span class="Apple-converted-space"> </span><br>> our fundamental difference in approach.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> You should be able to write something like:<span class="Apple-converted-space"> </span><br>> i32 %res = invoke patchpoint (... x86_trapping_divide, a, b) normal_dest<span class="Apple-converted-space"> </span><br>> invoke_dest<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> normal_dest:<span class="Apple-converted-space"> </span><br>> ;; use %res<span class="Apple-converted-space"> </span><br>> invoke_dest:<span class="Apple-converted-space"> </span><br>> landingpad<span class="Apple-converted-space"> </span><br>> ;; dispatch edge cases<span class="Apple-converted-space"> </span><br>> ;; this could be unreachable code if you deopt this frame in the trap<span class="Apple-converted-space"> </span><br>> handler and jump directly to an interpreter or other bit of code<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I see. It sounds like you want a generalization of the "div.with.stackmap"<span class="Apple-converted-space"> </span><br>> that I thought you wanted - you want to be able to wrap anything in a<span class="Apple-converted-space"> </span><br>> stackmap.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> The current patchpoint intrinsic does not do this, and you run the risk of<span class="Apple-converted-space"> </span><br>> breaking existing semantics if you changed this. You'd probably break<span class="Apple-converted-space"> </span><br>> WebKit, which treats the call target of the patchpoint as nothing more than<span class="Apple-converted-space"> </span><br>> a quirk - we always pass null. Also, the current patchpoint treats the<span class="Apple-converted-space"> </span><br>> callee as an i8* if I remember right and it would be super weird if all LLVM<span class="Apple-converted-space"> </span><br>> phases had to interpret this i8* by unwrapping a possible bitcast to get to<span class="Apple-converted-space"> </span><br>> a declared function that may be an intrinsic. Yuck! Basically, the call<span class="Apple-converted-space"> </span><br>> target of existing patchpoints is meant to be a kind of convenience feature<span class="Apple-converted-space"> </span><br>> rather than the core of the mechanism.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I agree in principle that the intrinsic that you want would be a useful<span class="Apple-converted-space"> </span><br>> intrinsic. But let's not call it a patchpoint for the purposes of this<span class="Apple-converted-space"> </span><br>> discussion, and let's not confuse the discussion by claiming (incorrectly)<span class="Apple-converted-space"> </span><br>> that the existing patchpoint facility gives you what you want. It doesn't:<span class="Apple-converted-space"> </span><br>> patchpoints are designed to make the call target opaque (hence the use of<span class="Apple-converted-space"> </span><br>> i8*) and there shouldn't be a correlation between what the patchpoint does<span class="Apple-converted-space"> </span><br>> at run-time and what the called function would have done. You could make<span class="Apple-converted-space"> </span><br>> the call target be null (like WebKit does) and the patchpoint should still<span class="Apple-converted-space"> </span><br>> mean "this code can do anything" because the expectation is that the client<span class="Apple-converted-space"> </span><br>> JIT will patch over it anyway.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Also, "patchpoint" would probably not be the right term for the intrinsic<span class="Apple-converted-space"> </span><br>> that you want. I think that what you want is "call.with.stackmap". Or<span class="Apple-converted-space"> </span><br>> maybe "stackmap.wrapper". Or just "stackmap" - I'd be OK, in principle,<span class="Apple-converted-space"> </span><br>> with changing the name of the current "stackmap" intrinsic to something that<span class="Apple-converted-space"> </span><br>> reflects the fact that it's less of a stackmap than what you want.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> To summarize. A patchpoint's main purpose is that you can patch it with<span class="Apple-converted-space"> </span><br>> arbitrary code. The current "stackmap" means that you can patch it with<span class="Apple-converted-space"> </span><br>> arbitrary code and that patching may be destructive to a shadow of machine<span class="Apple-converted-space"> </span><br>> code bytes, so it's really just like patchpoints - we could change its name<span class="Apple-converted-space"> </span><br>> to "patchpoint.shadow" for example.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> If you were to propose such a stackmap intrinsic, then I think there could<span class="Apple-converted-space"> </span><br>> be some ways of doing it that wouldn't be too terrible. Basically you want<span class="Apple-converted-space"> </span><br>> something that is like a patchpoint in that it reports a stackmap via a side<span class="Apple-converted-space"> </span><br>> channel, but unlike patchpoints, it doesn't allow arbitrary patching -<span class="Apple-converted-space"> </span><br>> instead the optimizer should be allowed to assume that the code within the<span class="Apple-converted-space"> </span><br>> patchpoint will always do the same thing that the call target would have<span class="Apple-converted-space"> </span><br>> done. There are downsides to truly doing this. For example, to make<span class="Apple-converted-space"> </span><br>> division efficient with such an intrinsic, you'd have to do something that<span class="Apple-converted-space"> </span><br>> is somewhat worse than just recognizing intrinsics in the optimizer - you'd<span class="Apple-converted-space"> </span><br>> have to first recognize a call to your "stackmap wrapper" intrinsic and then<span class="Apple-converted-space"> </span><br>> observe that its call target argument is in turn another intrinsic. To me<span class="Apple-converted-space"> </span><br>> personally, that's kind of yucky, but I won't deny that it could be useful.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> As to the use of invoke: I don't believe that the use of invoke versus my<span class="Apple-converted-space"> </span><br>> suggested "branch on a trap predicate" idea are different in any truly<span class="Apple-converted-space"> </span><br>> meaningful way. I buy that either would work.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> A patchpoint should not require any excess spilling. If values are live in<span class="Apple-converted-space"> </span><br>> registers, that should be reflected in the stack map. (I do not know if<span class="Apple-converted-space"> </span><br>> this is the case for patchpoint at the moment or not.)<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Patchpoints do not require spilling.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> My point was that with existing patchpoints, you *either* use a patchpoint<span class="Apple-converted-space"> </span><br>> for the entire division which makes the division opaque to the optimizer -<span class="Apple-converted-space"> </span><br>> because a patchpoint means "this code can do anything" - *or* you could<span class="Apple-converted-space"> </span><br>> spill stuff to the stack prior to your trapping division intrinsic, since<span class="Apple-converted-space"> </span><br>> spilling is something that you could do as a workaround if you didn't have a<span class="Apple-converted-space"> </span><br>> patchpoint.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> The reason why I brought up spilling at all is that I suspect that spilling<span class="Apple-converted-space"> </span><br>> all state to the stack might be cheaper - for some systems - than turning<span class="Apple-converted-space"> </span><br>> the division into a patchpoint. Turning the division into a patchpoint is<span class="Apple-converted-space"> </span><br>> horrendously brutal - the patchpoint looks like it clobbers the heap (which<span class="Apple-converted-space"> </span><br>> a division doesn't do), has to execute (a division is an obvious DCE<span class="Apple-converted-space"> </span><br>> candidate), cannot be hoisted (hoisting divisions is awesome), etc. Perhaps<span class="Apple-converted-space"> </span><br>> most importantly, though, a patchpoint doesn't tell LLVM that you're *doing<span class="Apple-converted-space"> </span><br>> a division* - so all constant folding, strenght reduction, and algebraic<span class="Apple-converted-space"> </span><br>> reasoning flies out the window. On the other hand, spilling all state to<span class="Apple-converted-space"> </span><br>> the stack is an arguably sound and performant solution to a lot of VM<span class="Apple-converted-space"> </span><br>> problems. I've seen JVM implementations that ensure that there is always a<span class="Apple-converted-space"> </span><br>> copy of state on the stack at some critical points, just because it makes<span class="Apple-converted-space"> </span><br>> loads of stuff simpler (debugging, profiling, GC, and of course deopt). I<span class="Apple-converted-space"> </span><br>> personally prefer to stay away from such a strategy because it's not free.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> On the other hand, branches can be cheap. A branch on a divide is cheaper<span class="Apple-converted-space"> </span><br>> than not being able to optimize the divide.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> The Value called by a patchpoint should participate in optimization<span class="Apple-converted-space"> </span><br>> normally.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I agree that you could have a different intrinsic that behaves like this.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> We really want the patchpoint part of the call to be supplemental. It<span class="Apple-converted-space"> </span><br>> should still be a call. It should be constant propagated, transformed,<span class="Apple-converted-space"> </span><br>> etc.. This is not the case currently. I've got a couple of off the wall<span class="Apple-converted-space"> </span><br>> ideas for improving the current status, but I'll agree this is a hardish<span class="Apple-converted-space"> </span><br>> problem.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> It should be legal to use a patchpoint in an invoke. It's an ABI issue of<span class="Apple-converted-space"> </span><br>> how the invoke path gets invoked. (i.e. side tables for the runtime to<span class="Apple-converted-space"> </span><br>> lookup, etc..) This is not possible today, and probably requires a fair<span class="Apple-converted-space"> </span><br>> amount of work. Some of it, I've already done and will be sharing shortly.<span class="Apple-converted-space"> </span><br>> Other parts, I haven't even thought about.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Right, it's significantly more complex than either the existing patchpoints<span class="Apple-converted-space"> </span><br>> or Michael's proposed safe.div.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> If you didn't want to use the trapping semantics, you'd insert dedicated<span class="Apple-converted-space"> </span><br>> control flow _before_ the divide. This would allow normal optimization of<span class="Apple-converted-space"> </span><br>> the control flow.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Notes:<span class="Apple-converted-space"> </span><br>> 1) This might require a new PATCHPOINT pseudo op in the backend. Haven't<span class="Apple-converted-space"> </span><br>> thought much about that yet.<span class="Apple-converted-space"> </span><br>> 2) I *think* your current intrinsic could be translated into something like<span class="Apple-converted-space"> </span><br>> this. (Leaving aside the question of where the deopt state comes from.) In<span class="Apple-converted-space"> </span><br>> fact, the more I look at this, the less difference I actually see between<span class="Apple-converted-space"> </span><br>> the approaches.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> In a lot of languages, a divide produces some result even in the exceptional<span class="Apple-converted-space"> </span><br>> case and this result requires effectively deoptimizing since the resut won't<span class="Apple-converted-space"> </span><br>> be the one you would have predicted (double instead of int, or BigInt<span class="Apple-converted-space"> </span><br>> instead of small int), which sort of means that if the CPU exception occurs<span class="Apple-converted-space"> </span><br>> you have to be able to reconstruct all state. A patchpoint could do this,<span class="Apple-converted-space"> </span><br>> and so could spilling all state to the stack before the divide - but both<span class="Apple-converted-space"> </span><br>> are very heavy hammers that are sure to be more expensive than just doing a<span class="Apple-converted-space"> </span><br>> branch.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> This isn't necessarily as expensive as you might believe. I'd recommend<span class="Apple-converted-space"> </span><br>> reading the Graal project papers on this topic.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Whether deopt or branching is more profitable *in this case*, I can't easily<span class="Apple-converted-space"> </span><br>> say. I'm not yet to the point of being able to run that experiment. We can<span class="Apple-converted-space"> </span><br>> argue about what "should" be better all we want, but real performance data<span class="Apple-converted-space"> </span><br>> is the only way to truly know.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> My point may have been confusing. I know that deoptimization is cheap and<span class="Apple-converted-space"> </span><br>> WebKit uses it everywhere, including division corner cases, if profiling<span class="Apple-converted-space"> </span><br>> tells us that it's profitable to do so (which it does, in the common case).<span class="Apple-converted-space"> </span><br>> WebKit is a heavy user of deoptimization in general, so you don't need to<span class="Apple-converted-space"> </span><br>> convince me that it's worth it.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Acknowledged.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Note that I want *both* deopt *and* branching, because in this case, a<span class="Apple-converted-space"> </span><br>> branch is the fastest overall way of detecting when to deopt. In the<span class="Apple-converted-space"> </span><br>> future, I will want to implement the deopt in terms of branching, and when<span class="Apple-converted-space"> </span><br>> we do this, I believe that the most sound and performat approach would be<span class="Apple-converted-space"> </span><br>> using Michael's intrinsics. This is subtle and I'll try to explain why it's<span class="Apple-converted-space"> </span><br>> the case.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> The point is that you wouldn't want to do deoptimization by spilling state<span class="Apple-converted-space"> </span><br>> on the main path or by using a patchpoint for the main path of the division.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> This is the main point I disagree with. I don't believe that having a<span class="Apple-converted-space"> </span><br>> patchpoint on the main path should be any more expensive then the original<span class="Apple-converted-space"> </span><br>> call. (see above)<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> The reason why the patchpoint is expensive is that if you use a patchpoint<span class="Apple-converted-space"> </span><br>> to implement a division then the optimizer won't be allowed to assume that<span class="Apple-converted-space"> </span><br>> it's a division, because the whole point of "patchpoint" is to tell the<span class="Apple-converted-space"> </span><br>> optimizer to piss off.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Worth noting explicitly: I'm assuming that all of your deopt state would<span class="Apple-converted-space"> </span><br>> already be available for other purposes in nearby code. It's on the stack<span class="Apple-converted-space"> </span><br>> or in registers. I'm assuming that by adding the deopt point, you are not<span class="Apple-converted-space"> </span><br>> radically changing the set of computations which need to be done. If that's<span class="Apple-converted-space"> </span><br>> not the case, you should avoid deopt and instead just inline the slow paths<span class="Apple-converted-space"> </span><br>> with explicit checks.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Yes, of course it is. That's not the issue.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I'll note that given your assumptions about the cost of a patchpoint, the<span class="Apple-converted-space"> </span><br>> rest of your position makes a lot more sense. :) As I spelled out above, I<span class="Apple-converted-space"> </span><br>> believe this cost is not fundamental.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> You don't want the common path of executing the division to involve a<span class="Apple-converted-space"> </span><br>> patchpoint instruction, although using a patchpoint or stackmap to implement<span class="Apple-converted-space"> </span><br>> deoptimization on the failing path is great:<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Good: if (division would fail) { call @patchpoint(all of my state) } else {<span class="Apple-converted-space"> </span><br>> result = a / b }<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Given your cost assumptions, I'd agree.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Not my cost assumptions. The reason why this is better is that the division<span class="Apple-converted-space"> </span><br>> is expressed in LLVM IR so that LLVM can do useful things to it - like<span class="Apple-converted-space"> </span><br>> eliminate it, for example.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Bad: call @patchpoint(all of my state) // patch with a divide instruction -<span class="Apple-converted-space"> </span><br>> bad because the optimizer has no clue what you're doing and assumes the very<span class="Apple-converted-space"> </span><br>> worst<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Yuck. Agreed.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> To be clear, this is what you're proposing - except that you're assuming<span class="Apple-converted-space"> </span><br>> that LLVM will know that you've patched a division because you're expecting<span class="Apple-converted-space"> </span><br>> the call target to have semantic meaning. Or, rather, you're expecting that<span class="Apple-converted-space"> </span><br>> you can make the contents of the patchpoint be a division by having the call<span class="Apple-converted-space"> </span><br>> target be a division intrinsic. In the current implementation and as it is<span class="Apple-converted-space"> </span><br>> currently specified, the call target has no meaning and so you get the yuck<span class="Apple-converted-space"> </span><br>> that I'm showing.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Worse: spill all state to the stack; call @trapping.div(a, b) // the spills<span class="Apple-converted-space"> </span><br>> will hurt you far more than a branch, so this should be avoided<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I'm assuming this is an explicit spill rather than simply recording a stack<span class="Apple-converted-space"> </span><br>> map *at the div*. If so, agreed.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I suppose we could imagine a fourth option that involves a patchpoint to<span class="Apple-converted-space"> </span><br>> pick up the state and a trapping divide instrinsic. But a trapping divide<span class="Apple-converted-space"> </span><br>> intrinsic alone is not enough. Consider this:<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> result = call @trapping.div(a, b); call @stackmap(all of my state)<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> As soon as these are separate instructions, you have no guarantees that the<span class="Apple-converted-space"> </span><br>> state that the stackmap reports is sound for the point at which the div<span class="Apple-converted-space"> </span><br>> would trap.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> This is the closest to what I'd propose, except that the two calls would be<span class="Apple-converted-space"> </span><br>> merged into a single patchpoint. Isn't the entire point of a patchpoint to<span class="Apple-converted-space"> </span><br>> record the stack map for a call?<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> No. It would be bad if that's what the documentation says. That's not at<span class="Apple-converted-space"> </span><br>> all how WebKit uses it or probably any IC client would use it.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Patchpoints are designed to be inline assembly on steroids. They're there<span class="Apple-converted-space"> </span><br>> to allow the client JIT to tell LLVM to piss off.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> (Well, ignoring the actual patching part..) Why not write this as:<span class="Apple-converted-space"> </span><br>> patchpoint(..., trapping.div, a, b);<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Is there something I'm missing here?<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Just to note: I fully agree that the two call proposal is unsound and should<span class="Apple-converted-space"> </span><br>> be avoided.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> So, the division itself shouldn't be a trapping instruction and instead you<span class="Apple-converted-space"> </span><br>> want to detect the bad case with a branch.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> To be clear:<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> - Whether you use deoptimization for division or anything else - like WebKit<span class="Apple-converted-space"> </span><br>> has done since before any of the Graal papers were written - is mostly<span class="Apple-converted-space"> </span><br>> unrelated to how you represent the division, unless you wanted to add a new<span class="Apple-converted-space"> </span><br>> intrinsic that is like a trapping-division-with-stackmap:<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> result = call @trapping.div.with.stackmap(a, b, ... all of my state ...)<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Now, maybe you do want such an intrinsic, in which case you should propose<span class="Apple-converted-space"> </span><br>> it!<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Given what we already have with patchpoints, I don't think a merged<span class="Apple-converted-space"> </span><br>> intrinsic is necessary. (See above). I believe we have all the parts to<span class="Apple-converted-space"> </span><br>> build this solution, and that - in theory - they should compose neatly.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> p.s. The bits I was referring to was not deopt per se. It was particularly<span class="Apple-converted-space"> </span><br>> which set of deopt state you used for each deopt point. That's a bit of<span class="Apple-converted-space"> </span><br>> tangent for the rest of the discussion now though.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> The reason why I haven't proposed it is that I think that long-term, the<span class="Apple-converted-space"> </span><br>> currently proposed intrinsics are a better path to getting the trapping<span class="Apple-converted-space"> </span><br>> optimizations. See my previous mail, where I show how we could tell LLVM<span class="Apple-converted-space"> </span><br>> what the failing path is (which may have deoptimization code that uses a<span class="Apple-converted-space"> </span><br>> stackmap or whatever), what the trapping predicate is (it comes from the<span class="Apple-converted-space"> </span><br>> safe.div intrinsic), and the fact that trapping is wise (branch weights).<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> - If you want to do the deoptimization with a trap, then your only choice<span class="Apple-converted-space"> </span><br>> currently is to use a patchpoint for the main path of the division. This<span class="Apple-converted-space"> </span><br>> will be slower than using a branch to an OSR exit basic block, because<span class="Apple-converted-space"> </span><br>> you're making the division itself opaque to the optimizer (bad) just to get<span class="Apple-converted-space"> </span><br>> rid of a branch (which was probably cheap to begin with).<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Hence, what you want to do - one way or another, regardless of whether this<span class="Apple-converted-space"> </span><br>> proposed intrinsic is added - is to branch on the corner case condition, and<span class="Apple-converted-space"> </span><br>> have the slow case of the branch go to a basic block that deoptimizes.<span class="Apple-converted-space"> </span><br>> Unless of course you have profiling that says that the case does happen<span class="Apple-converted-space"> </span><br>> often, in which case you can have that basic block handle the corner case<span class="Apple-converted-space"> </span><br>> inline without leaving optimized code (FWIW, we do have such paths in WebKit<span class="Apple-converted-space"> </span><br>> and they are useful).<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> So the question for me is whether the branching involves explicit control<span class="Apple-converted-space"> </span><br>> flow or is hidden inside an intrinsic. I prefer for it to be within an<span class="Apple-converted-space"> </span><br>> intrinsic because it:<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> - allows the optimizer to do more interesting things in the common cases,<span class="Apple-converted-space"> </span><br>> like hoisting the entire division.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> - will give us a clearer path for implementing trapping optimizations in the<span class="Apple-converted-space"> </span><br>> future.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> - is an immediate win on ARM.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I'd be curious to hear what specific idea you have about how to implement<span class="Apple-converted-space"> </span><br>> trap-based deoptimization with your trapping division intrinsic for x86 -<span class="Apple-converted-space"> </span><br>> maybe it's different from the two "bad" idioms I showed above.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I hope my explanation above helps. If not, ask, and I'll try to explain<span class="Apple-converted-space"> </span><br>> more clearly.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I think I understand it. I think that the only issue is that:<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> - Patchpoints currently don't do what you want.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> - If you made patchpoints do what you want then you'd break WebKit - not to<span class="Apple-converted-space"> </span><br>> mention anyone who wants to use them for inline caches.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> So it seems like you want a new intrinsic. You should officially propose<span class="Apple-converted-space"> </span><br>> this new intrinsic, particularly since the core semantic differences are not<span class="Apple-converted-space"> </span><br>> so great from what we have now. OTOH, if you truly believe that patchpoints<span class="Apple-converted-space"> </span><br>> should just be changed to your semantics in a way that does break WebKit,<span class="Apple-converted-space"> </span><br>> then that's probably also something you should get off your chest. ;-)<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> One point just for clarity; I don't believe this effects the conclusions of<span class="Apple-converted-space"> </span><br>> our discussion so far. I'm also fairly sure that you (Filip) are aware of<span class="Apple-converted-space"> </span><br>> this, but want to spell it out for other readers.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> You seem to be assuming that compiled code needs to explicitly branch to a<span class="Apple-converted-space"> </span><br>> point where deopt state is known to exit a compiled frame.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> This is a slightly unclear characterization of my assumptions. Our JIT does<span class="Apple-converted-space"> </span><br>> deoptimization without explicit branches for many, many things. You should<span class="Apple-converted-space"> </span><br>> look at it some time, it's pretty fancy. ;-)<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Worth noting is that you can also exit a compiled frame on a trap (without<span class="Apple-converted-space"> </span><br>> an explicitly condition/branch!) if the deopt state is known at the point<span class="Apple-converted-space"> </span><br>> you take the trap. This "exit frame on trap" behavior shows up with null<span class="Apple-converted-space"> </span><br>> pointer exceptions as well. I'll note that both compilers in OpenJDK<span class="Apple-converted-space"> </span><br>> support some combination of "exit-on-trap" conditions for division and null<span class="Apple-converted-space"> </span><br>> dereferences. The two differ on exactly what strategies they use in each<span class="Apple-converted-space"> </span><br>> case though. :)<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Yeah, and I've also implemented VMs that do this - and I endorse the basic<span class="Apple-converted-space"> </span><br>> idea. I know what you want, and my only point is that the existing<span class="Apple-converted-space"> </span><br>> patchpoints only give you this if you're willing to make a huge compromise:<span class="Apple-converted-space"> </span><br>> namely, that you're willing to make the division (or heap load for the null<span class="Apple-converted-space"> </span><br>> case) completely opaque to the compiler to the point that GVN, LICM, SCCP,<span class="Apple-converted-space"> </span><br>> and all algebraic reasoning have to give up on optimizing it. The point of<span class="Apple-converted-space"> </span><br>> using LLVM is that it can optimize code. It can optimize branches and<span class="Apple-converted-space"> </span><br>> divisions pretty well. So, eliminating an explicit branch by replacing it<span class="Apple-converted-space"> </span><br>> with a construct that appears opaque to the optimizer is not a smart<span class="Apple-converted-space"> </span><br>> trade-off.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> You could add a new intrinsic that, like patchpoints, records the layout of<span class="Apple-converted-space"> </span><br>> state in a side-table, but that is used as a kind of wrapper for operations<span class="Apple-converted-space"> </span><br>> that LLVM understands. This may or may not be hairy - you seem to have sort<span class="Apple-converted-space"> </span><br>> of acknowledged that it's got some complexity and I've also pointed out some<span class="Apple-converted-space"> </span><br>> possible issues. If this is something that you want, you should propose it<span class="Apple-converted-space"> </span><br>> so that others know what you're talking about. One danger of how we're<span class="Apple-converted-space"> </span><br>> discussing this right now is that you're overloading patchpoints to mean the<span class="Apple-converted-space"> </span><br>> thing you want them to mean rather than what they actually mean, which makes<span class="Apple-converted-space"> </span><br>> it seem like we don't need Michael's intrinsics on the grounds that<span class="Apple-converted-space"> </span><br>> patchpoints already offer a solution. They don't already offer a solution<span class="Apple-converted-space"> </span><br>> precisely because patchpoints don't do what your intrinsics would do.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I'm not really arguing that either scheme is "better" in all cases. I'm<span class="Apple-converted-space"> </span><br>> simply arguing that we should support both and allow optimization and tuning<span class="Apple-converted-space"> </span><br>> between them. As far as I can tell, you seem to be assuming that an<span class="Apple-converted-space"> </span><br>> explicit branch to known exit point is always superior.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Ok, back to the topic at hand...<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> With regards to the current proposal, I'm going to take a step back. You<span class="Apple-converted-space"> </span><br>> guys seem to have already looked in this in a fair amount of depth. I'm not<span class="Apple-converted-space"> </span><br>> necessarily convinced you've come to the best solution, but at some point,<span class="Apple-converted-space"> </span><br>> we need to make forward progress. What you have is clearly better than<span class="Apple-converted-space"> </span><br>> nothing.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Please go ahead and submit your current approach. We can come back and<span class="Apple-converted-space"> </span><br>> revise later if we really need to.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I do request the following changes:<span class="Apple-converted-space"> </span><br>> - Mark it clearly as experimental.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> - Either don't specify the value computed in the edge cases, or allow those<span class="Apple-converted-space"> </span><br>> values to be specified as constant arguments to the call. This allows<span class="Apple-converted-space"> </span><br>> efficient lowering to x86's div instruction if you want to make use of the<span class="Apple-converted-space"> </span><br>> trapping semantics.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Once again: how would you use this to get trapping semantics without<span class="Apple-converted-space"> </span><br>> throwing all of LLVM's optimizations out the window, in the absence of the<span class="Apple-converted-space"> </span><br>> kind of patchpoint-like intrinsic that you want? I ask just to make sure<span class="Apple-converted-space"> </span><br>> that we're on the same page.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Finally, as for performance data, which part of this do you want performance<span class="Apple-converted-space"> </span><br>> data for? I concede that I don't have performance data for using Michael's<span class="Apple-converted-space"> </span><br>> new intrinsic. Part of what the intrinsic accomplishes is it gives a less<span class="Apple-converted-space"> </span><br>> ugly way of doing something that is already possible with target intrinsics<span class="Apple-converted-space"> </span><br>> on ARM. I think it would be great if you could get those semantics - along<span class="Apple-converted-space"> </span><br>> with a known-good implementation - on other architectures as well.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I would be very interested in seeing data comparing two schemes:<span class="Apple-converted-space"> </span><br>> - Explicit control flow emited by the frontend<span class="Apple-converted-space"> </span><br>> - The safe.div intrinsic emitted by the frontend, desugared in CodeGenPrep<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> My strong suspicion is that each would preform well in some cases and not in<span class="Apple-converted-space"> </span><br>> others. At least on x86. Since the edge-checks are essentially free on<span class="Apple-converted-space"> </span><br>> ARM, the second scheme would probably be strictly superior there.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> I am NOT asking that we block submission on this data however.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> But this discussion has also involved suggestions that we should use<span class="Apple-converted-space"> </span><br>> trapping to implement deoptimization, and the main point of my message is to<span class="Apple-converted-space"> </span><br>> strongly argue against anything like this given the current state of<span class="Apple-converted-space"> </span><br>> trapping semantics and how patchpoints work. My point is that using traps<span class="Apple-converted-space"> </span><br>> for division corner cases would either be unsound (see the stackmap after<span class="Apple-converted-space"> </span><br>> the trap, above), or would require you to do things that are obviously<span class="Apple-converted-space"> </span><br>> inefficient. If you truly believe that the branch to detect division slow<span class="Apple-converted-space"> </span><br>> paths is more expensive than spilling all bytecode state to the stack or<span class="Apple-converted-space"> </span><br>> using a patchpoint for the division, then I could probably hack something up<span class="Apple-converted-space"> </span><br>> in WebKit to show you the performance implications. (Or you could do it<span class="Apple-converted-space"> </span><br>> yourself, the code is open source...)<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> In a couple of months, I'll probably have the performance data to discuss<span class="Apple-converted-space"> </span><br>> this for real. When that happens, let's pick this up and continue the<span class="Apple-converted-space"> </span><br>> debate. Alternatively, if you want to chat this over more with a beer in<span class="Apple-converted-space"> </span><br>> hand at the social next week, let me know. In the meantime, let's not stall<span class="Apple-converted-space"> </span><br>> the current proposal any more.<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>> Philip<span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br>><span class="Apple-converted-space"> </span><br></div></div></span></blockquote></div></body></html>