[LLVMdev] Proposal: add intrinsics for safe division

Fri May 2 11:57:09 PDT 2014

On May 2, 2014 at 11:53:25 AM, Eric Christopher (echristo at gmail.com) wrote:

On Wed, Apr 30, 2014 at 10:34 PM, Philip Reames 
<listmail at philipreames.com> wrote: 
> Andy - If you're not already following this closely, please start. We've 
> gotten into fairly fundamental questions of what a patchpoint does. 
> 
> Filip, 
> 
> I think you've hit the nail on the head. What I'm thinking of as being 
> patchpoints are not what you think they are. Part of that is that I've got 
> a local change which adds a very similar construction (called "statepoints" 
> for the moment), but I was trying to keep that separate. That also includes 
> a lot of GC semantics which are not under discussion currently. My 
> apologies if that experience bled over into this conversation and made 
> things more confusing. 
> 
> I will note that the documentation for patchpoint say explicitly the 
> following: 
> "The ‘llvm.experimental.patchpoint.*‘ intrinsics creates a function call to 
> the specified <target> and records the location of specified values in the 
> stack map." 
> 
> My reading has always been that a patchpoint *that isn't patched* is simply 
> a call with a stackmap associated with it. To my reading, this can (and 
> did, and does) indicate my proposed usage would be legal. 
> 

I like the idea that the target can be assumed to be called. It makes 
optimization of the call possible, etc. I think it's definitely worth 
exploring before we lock down the patchpoint intrinsic. 
I will actively oppose this.

-Filip

-eric 

> I will agree that I've confused the topic badly on the optimization front. 
> My "statepoint" isn't patchable, so a lot more optimizations are legal. 
> Sorry about that. To restate what I think you've been saying all along, the 
> optimizer can't make assumptions about what function is called by a 
> patchpoint because that might change based on later patching. Is this the 
> key point you've been trying to make? 
> 
> I'm not objecting to separating "my patchpoint" from "your patchpoint". 
> Let's just hammer out the semantics of each first. :) 
> 
> Again, longer response to follow in a day or so. :) 
> 
> Philip 
> 
> 
> On 04/30/2014 10:09 PM, Filip Pizlo wrote: 
> 
> 
> 
> On April 30, 2014 at 9:06:20 PM, Philip Reames (listmail at philipreames.com) 
> wrote: 
> 
> On 04/29/2014 12:39 PM, Filip Pizlo wrote: 
> 
> On April 29, 2014 at 11:27:06 AM, Philip Reames (listmail at philipreames.com) 
> wrote: 
> 
> On 04/29/2014 10:44 AM, Filip Pizlo wrote: 
> 
> LD;DR: Your desire to use trapping on x86 only further convinces me that 
> Michael's proposed intrinsics are the best way to go. 
> 
> I'm still not convinced, but am not going to actively oppose it either. I'm 
> leery of designing a solution with major assumptions we don't have data to 
> backup. 
> 
> I worry your assumptions about deoptimization are potentially unsound. But 
> I don't have data to actually show this (yet). 
> 
> I *think* I may have been unclear about my assumptions; in particular, my 
> claims with respect to deoptimization are probably more subtle than they 
> appeared. WebKit can use LLVM and it has divisions and we do all possible 
> deoptimization/profiling/etc tricks for it, so this is grounded in 
> experience. Forgive me if the rest of this e-mail contains a lecture on 
> things that are obvious - I'll try to err on the side of clarity and 
> completeness since this discussion is sufficiently dense that we run the 
> risk of talking cross-purposes unless some baseline assumptions are 
> established. 
> 
> I think we're using the same terminology, but with slightly different sets 
> of assumptions. I'll point this out below where relevant. 
> 
> Also, thanks for taking the time to expand. It help clarify the discussion 
> quite a bit. 
> 
> I think we may be converging to an understanding of what you want versus 
> what I want, and I think that there are some points - possibly unrelated to 
> division - that are worth clarifying. I think that the main difference is 
> that when I say "patchpoint", I am referring to a concrete intrinsic with 
> specific semantics that cannot change without breaking WebKit, while you are 
> using the term to refer to a broad concept, or rather, a class of 
> as-yet-unimplemented intrinsics that share some of the same features with 
> patchpoints but otherwise have incompatible semantics. 
> 
> Also, when I say that you wouldn't want to use the existing patchpoint to do 
> your trapping deopt, what I mean is that the performance of doing this would 
> suck for reasons that are not related to deoptimization or trapping. I'm 
> not claiming that deoptimization performs poorly (trust me, I know better) 
> or that trapping to deoptimize is bad (I've done this many, many times and I 
> know better). I'm saying that with the current patchpoint intrinsics in 
> LLVM, as they are currently specified and implemented, doing it would be a 
> bad idea because you'd have to compromise a bunch of other optimizations to 
> achieve it. 
> 
> You have essentially described new intrinsics that would make this less of a 
> bad idea and I am interested in your intrinsics, so I'll try to both respond 
> with why patchpoints don't currently give you what you want (and why simply 
> changing patchpoint semantics would be evil) and I'll also try to comment on 
> what I think of the intrinsic that you're effectively proposing. Long story 
> short, I think you should formally propose your intrinsic even if it's not 
> completely fleshed out. I think that it's an interesting capability and in 
> its most basic form, it is a simple variation of the current 
> patchpoint/stackmap intrinsics. 
> 
> 
> 
> 
> 
> On April 29, 2014 at 10:09:49 AM, Philip Reames (listmail at philipreames.com) 
> wrote: 
> 
> As the discussion has progressed and I've spent more time thinking about the 
> topic, I find myself less and less enthused about the current proposal. I'm 
> in full support of having idiomatic ways to express safe division, but I'm 
> starting to doubt that using an intrinsic is the right way at the moment. 
> 
> One case I find myself thinking about is how one would combine profiling 
> information and implicit div-by-zero/overflow checks with this proposal. I 
> don't really see a clean way. Ideally, for a "safe div" which never has the 
> exceptional paths taken, you'd like to completely do away with the control 
> flow entirely. (And rely on hardware traps w/exceptions instead.) I don't 
> really see a way to represent that type of construct given the current 
> proposal. 
> 
> This is a deeper problem and to solve it you'd need a solution to trapping 
> in general. Let's consider the case of Java. A Java program may want to 
> catch the arithmetic exception due to divide by zero. How would you do this 
> with a trap in LLVM IR? Spill all state that is live at the catch? Use a 
> patchpoint for the entire division instruction? 
> 
> We'd likely use something similar to a patchpoint. You'd need the "abstract 
> vm state" (which is not the compiled frame necessarily) available at the div 
> instruction. You could then re-enter the interpreter at the specified index 
> (part of the vm state). We have all most of these mechanisms in place. 
> Ideally, you'd trigger a recompile and otherwise ensure re-entry into 
> compiled code at the soonest possible moment. 
> 
> This requires a lot of runtime support, but we already have most of it 
> implemented for another compiler. From our perspective, the runtime 
> requirements are not a major blocker. 
> 
> Right, you'll use a patchpoint. That's way more expensive than using a safe 
> division intrinsic with branches, because it's opaque to the optimizer. 
> 
> This statement is true at the moment, but it shouldn't be. I think this is 
> our fundamental difference in approach. 
> 
> You should be able to write something like: 
> i32 %res = invoke patchpoint (... x86_trapping_divide, a, b) normal_dest 
> invoke_dest 
> 
> normal_dest: 
> ;; use %res 
> invoke_dest: 
> landingpad 
> ;; dispatch edge cases 
> ;; this could be unreachable code if you deopt this frame in the trap 
> handler and jump directly to an interpreter or other bit of code 
> 
> I see. It sounds like you want a generalization of the "div.with.stackmap" 
> that I thought you wanted - you want to be able to wrap anything in a 
> stackmap. 
> 
> The current patchpoint intrinsic does not do this, and you run the risk of 
> breaking existing semantics if you changed this. You'd probably break 
> WebKit, which treats the call target of the patchpoint as nothing more than 
> a quirk - we always pass null. Also, the current patchpoint treats the 
> callee as an i8* if I remember right and it would be super weird if all LLVM 
> phases had to interpret this i8* by unwrapping a possible bitcast to get to 
> a declared function that may be an intrinsic. Yuck! Basically, the call 
> target of existing patchpoints is meant to be a kind of convenience feature 
> rather than the core of the mechanism. 
> 
> I agree in principle that the intrinsic that you want would be a useful 
> intrinsic. But let's not call it a patchpoint for the purposes of this 
> discussion, and let's not confuse the discussion by claiming (incorrectly) 
> that the existing patchpoint facility gives you what you want. It doesn't: 
> patchpoints are designed to make the call target opaque (hence the use of 
> i8*) and there shouldn't be a correlation between what the patchpoint does 
> at run-time and what the called function would have done. You could make 
> the call target be null (like WebKit does) and the patchpoint should still 
> mean "this code can do anything" because the expectation is that the client 
> JIT will patch over it anyway. 
> 
> Also, "patchpoint" would probably not be the right term for the intrinsic 
> that you want. I think that what you want is "call.with.stackmap". Or 
> maybe "stackmap.wrapper". Or just "stackmap" - I'd be OK, in principle, 
> with changing the name of the current "stackmap" intrinsic to something that 
> reflects the fact that it's less of a stackmap than what you want. 
> 
> To summarize. A patchpoint's main purpose is that you can patch it with 
> arbitrary code. The current "stackmap" means that you can patch it with 
> arbitrary code and that patching may be destructive to a shadow of machine 
> code bytes, so it's really just like patchpoints - we could change its name 
> to "patchpoint.shadow" for example. 
> 
> If you were to propose such a stackmap intrinsic, then I think there could 
> be some ways of doing it that wouldn't be too terrible. Basically you want 
> something that is like a patchpoint in that it reports a stackmap via a side 
> channel, but unlike patchpoints, it doesn't allow arbitrary patching - 
> instead the optimizer should be allowed to assume that the code within the 
> patchpoint will always do the same thing that the call target would have 
> done. There are downsides to truly doing this. For example, to make 
> division efficient with such an intrinsic, you'd have to do something that 
> is somewhat worse than just recognizing intrinsics in the optimizer - you'd 
> have to first recognize a call to your "stackmap wrapper" intrinsic and then 
> observe that its call target argument is in turn another intrinsic. To me 
> personally, that's kind of yucky, but I won't deny that it could be useful. 
> 
> As to the use of invoke: I don't believe that the use of invoke versus my 
> suggested "branch on a trap predicate" idea are different in any truly 
> meaningful way. I buy that either would work. 
> 
> 
> 
> A patchpoint should not require any excess spilling. If values are live in 
> registers, that should be reflected in the stack map. (I do not know if 
> this is the case for patchpoint at the moment or not.) 
> 
> Patchpoints do not require spilling. 
> 
> My point was that with existing patchpoints, you *either* use a patchpoint 
> for the entire division which makes the division opaque to the optimizer - 
> because a patchpoint means "this code can do anything" - *or* you could 
> spill stuff to the stack prior to your trapping division intrinsic, since 
> spilling is something that you could do as a workaround if you didn't have a 
> patchpoint. 
> 
> The reason why I brought up spilling at all is that I suspect that spilling 
> all state to the stack might be cheaper - for some systems - than turning 
> the division into a patchpoint. Turning the division into a patchpoint is 
> horrendously brutal - the patchpoint looks like it clobbers the heap (which 
> a division doesn't do), has to execute (a division is an obvious DCE 
> candidate), cannot be hoisted (hoisting divisions is awesome), etc. Perhaps 
> most importantly, though, a patchpoint doesn't tell LLVM that you're *doing 
> a division* - so all constant folding, strenght reduction, and algebraic 
> reasoning flies out the window. On the other hand, spilling all state to 
> the stack is an arguably sound and performant solution to a lot of VM 
> problems. I've seen JVM implementations that ensure that there is always a 
> copy of state on the stack at some critical points, just because it makes 
> loads of stuff simpler (debugging, profiling, GC, and of course deopt). I 
> personally prefer to stay away from such a strategy because it's not free. 
> 
> On the other hand, branches can be cheap. A branch on a divide is cheaper 
> than not being able to optimize the divide. 
> 
> 
> 
> The Value called by a patchpoint should participate in optimization 
> normally. 
> 
> I agree that you could have a different intrinsic that behaves like this. 
> 
> We really want the patchpoint part of the call to be supplemental. It 
> should still be a call. It should be constant propagated, transformed, 
> etc.. This is not the case currently. I've got a couple of off the wall 
> ideas for improving the current status, but I'll agree this is a hardish 
> problem. 
> 
> It should be legal to use a patchpoint in an invoke. It's an ABI issue of 
> how the invoke path gets invoked. (i.e. side tables for the runtime to 
> lookup, etc..) This is not possible today, and probably requires a fair 
> amount of work. Some of it, I've already done and will be sharing shortly. 
> Other parts, I haven't even thought about. 
> 
> Right, it's significantly more complex than either the existing patchpoints 
> or Michael's proposed safe.div. 
> 
> 
> 
> If you didn't want to use the trapping semantics, you'd insert dedicated 
> control flow _before_ the divide. This would allow normal optimization of 
> the control flow. 
> 
> Notes: 
> 1) This might require a new PATCHPOINT pseudo op in the backend. Haven't 
> thought much about that yet. 
> 2) I *think* your current intrinsic could be translated into something like 
> this. (Leaving aside the question of where the deopt state comes from.) In 
> fact, the more I look at this, the less difference I actually see between 
> the approaches. 
> 
> 
> 
> In a lot of languages, a divide produces some result even in the exceptional 
> case and this result requires effectively deoptimizing since the resut won't 
> be the one you would have predicted (double instead of int, or BigInt 
> instead of small int), which sort of means that if the CPU exception occurs 
> you have to be able to reconstruct all state. A patchpoint could do this, 
> and so could spilling all state to the stack before the divide - but both 
> are very heavy hammers that are sure to be more expensive than just doing a 
> branch. 
> 
> This isn't necessarily as expensive as you might believe. I'd recommend 
> reading the Graal project papers on this topic. 
> 
> Whether deopt or branching is more profitable *in this case*, I can't easily 
> say. I'm not yet to the point of being able to run that experiment. We can 
> argue about what "should" be better all we want, but real performance data 
> is the only way to truly know. 
> 
> My point may have been confusing. I know that deoptimization is cheap and 
> WebKit uses it everywhere, including division corner cases, if profiling 
> tells us that it's profitable to do so (which it does, in the common case). 
> WebKit is a heavy user of deoptimization in general, so you don't need to 
> convince me that it's worth it. 
> 
> Acknowledged. 
> 
> Note that I want *both* deopt *and* branching, because in this case, a 
> branch is the fastest overall way of detecting when to deopt. In the 
> future, I will want to implement the deopt in terms of branching, and when 
> we do this, I believe that the most sound and performat approach would be 
> using Michael's intrinsics. This is subtle and I'll try to explain why it's 
> the case. 
> 
> The point is that you wouldn't want to do deoptimization by spilling state 
> on the main path or by using a patchpoint for the main path of the division. 
> 
> This is the main point I disagree with. I don't believe that having a 
> patchpoint on the main path should be any more expensive then the original 
> call. (see above) 
> 
> The reason why the patchpoint is expensive is that if you use a patchpoint 
> to implement a division then the optimizer won't be allowed to assume that 
> it's a division, because the whole point of "patchpoint" is to tell the 
> optimizer to piss off. 
> 
> 
> 
> Worth noting explicitly: I'm assuming that all of your deopt state would 
> already be available for other purposes in nearby code. It's on the stack 
> or in registers. I'm assuming that by adding the deopt point, you are not 
> radically changing the set of computations which need to be done. If that's 
> not the case, you should avoid deopt and instead just inline the slow paths 
> with explicit checks. 
> 
> Yes, of course it is. That's not the issue. 
> 
> 
> 
> I'll note that given your assumptions about the cost of a patchpoint, the 
> rest of your position makes a lot more sense. :) As I spelled out above, I 
> believe this cost is not fundamental. 
> 
> You don't want the common path of executing the division to involve a 
> patchpoint instruction, although using a patchpoint or stackmap to implement 
> deoptimization on the failing path is great: 
> 
> Good: if (division would fail) { call @patchpoint(all of my state) } else { 
> result = a / b } 
> 
> Given your cost assumptions, I'd agree. 
> 
> Not my cost assumptions. The reason why this is better is that the division 
> is expressed in LLVM IR so that LLVM can do useful things to it - like 
> eliminate it, for example. 
> 
> 
> Bad: call @patchpoint(all of my state) // patch with a divide instruction - 
> bad because the optimizer has no clue what you're doing and assumes the very 
> worst 
> 
> Yuck. Agreed. 
> 
> To be clear, this is what you're proposing - except that you're assuming 
> that LLVM will know that you've patched a division because you're expecting 
> the call target to have semantic meaning. Or, rather, you're expecting that 
> you can make the contents of the patchpoint be a division by having the call 
> target be a division intrinsic. In the current implementation and as it is 
> currently specified, the call target has no meaning and so you get the yuck 
> that I'm showing. 
> 
> 
> Worse: spill all state to the stack; call @trapping.div(a, b) // the spills 
> will hurt you far more than a branch, so this should be avoided 
> 
> I'm assuming this is an explicit spill rather than simply recording a stack 
> map *at the div*. If so, agreed. 
> 
> I suppose we could imagine a fourth option that involves a patchpoint to 
> pick up the state and a trapping divide instrinsic. But a trapping divide 
> intrinsic alone is not enough. Consider this: 
> 
> result = call @trapping.div(a, b); call @stackmap(all of my state) 
> 
> As soon as these are separate instructions, you have no guarantees that the 
> state that the stackmap reports is sound for the point at which the div 
> would trap. 
> 
> This is the closest to what I'd propose, except that the two calls would be 
> merged into a single patchpoint. Isn't the entire point of a patchpoint to 
> record the stack map for a call? 
> 
> No. It would be bad if that's what the documentation says. That's not at 
> all how WebKit uses it or probably any IC client would use it. 
> 
> Patchpoints are designed to be inline assembly on steroids. They're there 
> to allow the client JIT to tell LLVM to piss off. 
> 
> (Well, ignoring the actual patching part..) Why not write this as: 
> patchpoint(..., trapping.div, a, b); 
> 
> Is there something I'm missing here? 
> 
> Just to note: I fully agree that the two call proposal is unsound and should 
> be avoided. 
> 
> So, the division itself shouldn't be a trapping instruction and instead you 
> want to detect the bad case with a branch. 
> 
> To be clear: 
> 
> - Whether you use deoptimization for division or anything else - like WebKit 
> has done since before any of the Graal papers were written - is mostly 
> unrelated to how you represent the division, unless you wanted to add a new 
> intrinsic that is like a trapping-division-with-stackmap: 
> 
> result = call @trapping.div.with.stackmap(a, b, ... all of my state ...) 
> 
> Now, maybe you do want such an intrinsic, in which case you should propose 
> it! 
> 
> Given what we already have with patchpoints, I don't think a merged 
> intrinsic is necessary. (See above). I believe we have all the parts to 
> build this solution, and that - in theory - they should compose neatly. 
> 
> p.s. The bits I was referring to was not deopt per se. It was particularly 
> which set of deopt state you used for each deopt point. That's a bit of 
> tangent for the rest of the discussion now though. 
> 
> The reason why I haven't proposed it is that I think that long-term, the 
> currently proposed intrinsics are a better path to getting the trapping 
> optimizations. See my previous mail, where I show how we could tell LLVM 
> what the failing path is (which may have deoptimization code that uses a 
> stackmap or whatever), what the trapping predicate is (it comes from the 
> safe.div intrinsic), and the fact that trapping is wise (branch weights). 
> 
> - If you want to do the deoptimization with a trap, then your only choice 
> currently is to use a patchpoint for the main path of the division. This 
> will be slower than using a branch to an OSR exit basic block, because 
> you're making the division itself opaque to the optimizer (bad) just to get 
> rid of a branch (which was probably cheap to begin with). 
> 
> Hence, what you want to do - one way or another, regardless of whether this 
> proposed intrinsic is added - is to branch on the corner case condition, and 
> have the slow case of the branch go to a basic block that deoptimizes. 
> Unless of course you have profiling that says that the case does happen 
> often, in which case you can have that basic block handle the corner case 
> inline without leaving optimized code (FWIW, we do have such paths in WebKit 
> and they are useful). 
> 
> So the question for me is whether the branching involves explicit control 
> flow or is hidden inside an intrinsic. I prefer for it to be within an 
> intrinsic because it: 
> 
> - allows the optimizer to do more interesting things in the common cases, 
> like hoisting the entire division. 
> 
> - will give us a clearer path for implementing trapping optimizations in the 
> future. 
> 
> - is an immediate win on ARM. 
> 
> I'd be curious to hear what specific idea you have about how to implement 
> trap-based deoptimization with your trapping division intrinsic for x86 - 
> maybe it's different from the two "bad" idioms I showed above. 
> 
> I hope my explanation above helps. If not, ask, and I'll try to explain 
> more clearly. 
> 
> I think I understand it. I think that the only issue is that: 
> 
> - Patchpoints currently don't do what you want. 
> 
> - If you made patchpoints do what you want then you'd break WebKit - not to 
> mention anyone who wants to use them for inline caches. 
> 
> So it seems like you want a new intrinsic. You should officially propose 
> this new intrinsic, particularly since the core semantic differences are not 
> so great from what we have now. OTOH, if you truly believe that patchpoints 
> should just be changed to your semantics in a way that does break WebKit, 
> then that's probably also something you should get off your chest. ;-) 
> 
> 
> 
> One point just for clarity; I don't believe this effects the conclusions of 
> our discussion so far. I'm also fairly sure that you (Filip) are aware of 
> this, but want to spell it out for other readers. 
> 
> You seem to be assuming that compiled code needs to explicitly branch to a 
> point where deopt state is known to exit a compiled frame. 
> 
> This is a slightly unclear characterization of my assumptions. Our JIT does 
> deoptimization without explicit branches for many, many things. You should 
> look at it some time, it's pretty fancy. ;-) 
> 
> Worth noting is that you can also exit a compiled frame on a trap (without 
> an explicitly condition/branch!) if the deopt state is known at the point 
> you take the trap. This "exit frame on trap" behavior shows up with null 
> pointer exceptions as well. I'll note that both compilers in OpenJDK 
> support some combination of "exit-on-trap" conditions for division and null 
> dereferences. The two differ on exactly what strategies they use in each 
> case though. :) 
> 
> Yeah, and I've also implemented VMs that do this - and I endorse the basic 
> idea. I know what you want, and my only point is that the existing 
> patchpoints only give you this if you're willing to make a huge compromise: 
> namely, that you're willing to make the division (or heap load for the null 
> case) completely opaque to the compiler to the point that GVN, LICM, SCCP, 
> and all algebraic reasoning have to give up on optimizing it. The point of 
> using LLVM is that it can optimize code. It can optimize branches and 
> divisions pretty well. So, eliminating an explicit branch by replacing it 
> with a construct that appears opaque to the optimizer is not a smart 
> trade-off. 
> 
> You could add a new intrinsic that, like patchpoints, records the layout of 
> state in a side-table, but that is used as a kind of wrapper for operations 
> that LLVM understands. This may or may not be hairy - you seem to have sort 
> of acknowledged that it's got some complexity and I've also pointed out some 
> possible issues. If this is something that you want, you should propose it 
> so that others know what you're talking about. One danger of how we're 
> discussing this right now is that you're overloading patchpoints to mean the 
> thing you want them to mean rather than what they actually mean, which makes 
> it seem like we don't need Michael's intrinsics on the grounds that 
> patchpoints already offer a solution. They don't already offer a solution 
> precisely because patchpoints don't do what your intrinsics would do. 
> 
> 
> 
> I'm not really arguing that either scheme is "better" in all cases. I'm 
> simply arguing that we should support both and allow optimization and tuning 
> between them. As far as I can tell, you seem to be assuming that an 
> explicit branch to known exit point is always superior. 
> 
> 
> Ok, back to the topic at hand... 
> 
> With regards to the current proposal, I'm going to take a step back. You 
> guys seem to have already looked in this in a fair amount of depth. I'm not 
> necessarily convinced you've come to the best solution, but at some point, 
> we need to make forward progress. What you have is clearly better than 
> nothing. 
> 
> Please go ahead and submit your current approach. We can come back and 
> revise later if we really need to. 
> 
> I do request the following changes: 
> - Mark it clearly as experimental. 
> 
> 
> - Either don't specify the value computed in the edge cases, or allow those 
> values to be specified as constant arguments to the call. This allows 
> efficient lowering to x86's div instruction if you want to make use of the 
> trapping semantics. 
> 
> Once again: how would you use this to get trapping semantics without 
> throwing all of LLVM's optimizations out the window, in the absence of the 
> kind of patchpoint-like intrinsic that you want? I ask just to make sure 
> that we're on the same page. 
> 
> 
> 
> Finally, as for performance data, which part of this do you want performance 
> data for? I concede that I don't have performance data for using Michael's 
> new intrinsic. Part of what the intrinsic accomplishes is it gives a less 
> ugly way of doing something that is already possible with target intrinsics 
> on ARM. I think it would be great if you could get those semantics - along 
> with a known-good implementation - on other architectures as well. 
> 
> I would be very interested in seeing data comparing two schemes: 
> - Explicit control flow emited by the frontend 
> - The safe.div intrinsic emitted by the frontend, desugared in CodeGenPrep 
> 
> My strong suspicion is that each would preform well in some cases and not in 
> others. At least on x86. Since the edge-checks are essentially free on 
> ARM, the second scheme would probably be strictly superior there. 
> 
> I am NOT asking that we block submission on this data however. 
> 
> But this discussion has also involved suggestions that we should use 
> trapping to implement deoptimization, and the main point of my message is to 
> strongly argue against anything like this given the current state of 
> trapping semantics and how patchpoints work. My point is that using traps 
> for division corner cases would either be unsound (see the stackmap after 
> the trap, above), or would require you to do things that are obviously 
> inefficient. If you truly believe that the branch to detect division slow 
> paths is more expensive than spilling all bytecode state to the stack or 
> using a patchpoint for the division, then I could probably hack something up 
> in WebKit to show you the performance implications. (Or you could do it 
> yourself, the code is open source...) 
> 
> In a couple of months, I'll probably have the performance data to discuss 
> this for real. When that happens, let's pick this up and continue the 
> debate. Alternatively, if you want to chat this over more with a beer in 
> hand at the social next week, let me know. In the meantime, let's not stall 
> the current proposal any more. 
> 
> Philip 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140502/c068d1f3/attachment.html>