[LLVMdev] Proposal: add intrinsics for safe division

Philip Reames listmail at philipreames.com
Fri May 2 11:58:58 PDT 2014


On 05/02/2014 11:57 AM, Filip Pizlo wrote:
>
> On May 2, 2014 at 11:53:25 AM, Eric Christopher (echristo at gmail.com 
> <mailto:echristo at gmail.com>) wrote:
>
>> On Wed, Apr 30, 2014 at 10:34 PM, Philip Reames
>> <listmail at philipreames.com> wrote:
>> > Andy - If you're not already following this closely, please start. 
>> We've
>> > gotten into fairly fundamental questions of what a patchpoint does.
>> >
>> > Filip,
>> >
>> > I think you've hit the nail on the head. What I'm thinking of as being
>> > patchpoints are not what you think they are. Part of that is that 
>> I've got
>> > a local change which adds a very similar construction (called 
>> "statepoints"
>> > for the moment), but I was trying to keep that separate. That also 
>> includes
>> > a lot of GC semantics which are not under discussion currently. My
>> > apologies if that experience bled over into this conversation and made
>> > things more confusing.
>> >
>> > I will note that the documentation for patchpoint say explicitly the
>> > following:
>> > "The ‘llvm.experimental.patchpoint.*‘ intrinsics creates a function 
>> call to
>> > the specified <target> and records the location of specified values 
>> in the
>> > stack map."
>> >
>> > My reading has always been that a patchpoint *that isn't patched* 
>> is simply
>> > a call with a stackmap associated with it. To my reading, this can (and
>> > did, and does) indicate my proposed usage would be legal.
>> >
>>
>> I like the idea that the target can be assumed to be called. It makes
>> optimization of the call possible, etc. I think it's definitely worth
>> exploring before we lock down the patchpoint intrinsic.
>
> I will actively oppose this.
>
I think it sounds like we need to split patchpoint into two cases. I'm 
going to send a rough proposal for this later today.
>
> -Filip
>
>
>>
>>
>> -eric
>>
>> > I will agree that I've confused the topic badly on the optimization 
>> front.
>> > My "statepoint" isn't patchable, so a lot more optimizations are legal.
>> > Sorry about that. To restate what I think you've been saying all 
>> along, the
>> > optimizer can't make assumptions about what function is called by a
>> > patchpoint because that might change based on later patching. Is 
>> this the
>> > key point you've been trying to make?
>> >
>> > I'm not objecting to separating "my patchpoint" from "your patchpoint".
>> > Let's just hammer out the semantics of each first. :)
>> >
>> > Again, longer response to follow in a day or so. :)
>> >
>> > Philip
>> >
>> >
>> > On 04/30/2014 10:09 PM, Filip Pizlo wrote:
>> >
>> >
>> >
>> > On April 30, 2014 at 9:06:20 PM, Philip Reames 
>> (listmail at philipreames.com)
>> > wrote:
>> >
>> > On 04/29/2014 12:39 PM, Filip Pizlo wrote:
>> >
>> > On April 29, 2014 at 11:27:06 AM, Philip Reames 
>> (listmail at philipreames.com)
>> > wrote:
>> >
>> > On 04/29/2014 10:44 AM, Filip Pizlo wrote:
>> >
>> > LD;DR: Your desire to use trapping on x86 only further convinces me 
>> that
>> > Michael's proposed intrinsics are the best way to go.
>> >
>> > I'm still not convinced, but am not going to actively oppose it 
>> either. I'm
>> > leery of designing a solution with major assumptions we don't have 
>> data to
>> > backup.
>> >
>> > I worry your assumptions about deoptimization are potentially 
>> unsound. But
>> > I don't have data to actually show this (yet).
>> >
>> > I *think* I may have been unclear about my assumptions; in 
>> particular, my
>> > claims with respect to deoptimization are probably more subtle than 
>> they
>> > appeared. WebKit can use LLVM and it has divisions and we do all 
>> possible
>> > deoptimization/profiling/etc tricks for it, so this is grounded in
>> > experience. Forgive me if the rest of this e-mail contains a lecture on
>> > things that are obvious - I'll try to err on the side of clarity and
>> > completeness since this discussion is sufficiently dense that we 
>> run the
>> > risk of talking cross-purposes unless some baseline assumptions are
>> > established.
>> >
>> > I think we're using the same terminology, but with slightly 
>> different sets
>> > of assumptions. I'll point this out below where relevant.
>> >
>> > Also, thanks for taking the time to expand. It help clarify the 
>> discussion
>> > quite a bit.
>> >
>> > I think we may be converging to an understanding of what you want 
>> versus
>> > what I want, and I think that there are some points - possibly 
>> unrelated to
>> > division - that are worth clarifying. I think that the main 
>> difference is
>> > that when I say "patchpoint", I am referring to a concrete 
>> intrinsic with
>> > specific semantics that cannot change without breaking WebKit, 
>> while you are
>> > using the term to refer to a broad concept, or rather, a class of
>> > as-yet-unimplemented intrinsics that share some of the same 
>> features with
>> > patchpoints but otherwise have incompatible semantics.
>> >
>> > Also, when I say that you wouldn't want to use the existing 
>> patchpoint to do
>> > your trapping deopt, what I mean is that the performance of doing 
>> this would
>> > suck for reasons that are not related to deoptimization or 
>> trapping. I'm
>> > not claiming that deoptimization performs poorly (trust me, I know 
>> better)
>> > or that trapping to deoptimize is bad (I've done this many, many 
>> times and I
>> > know better). I'm saying that with the current patchpoint intrinsics in
>> > LLVM, as they are currently specified and implemented, doing it 
>> would be a
>> > bad idea because you'd have to compromise a bunch of other 
>> optimizations to
>> > achieve it.
>> >
>> > You have essentially described new intrinsics that would make this 
>> less of a
>> > bad idea and I am interested in your intrinsics, so I'll try to 
>> both respond
>> > with why patchpoints don't currently give you what you want (and 
>> why simply
>> > changing patchpoint semantics would be evil) and I'll also try to 
>> comment on
>> > what I think of the intrinsic that you're effectively proposing. 
>> Long story
>> > short, I think you should formally propose your intrinsic even if 
>> it's not
>> > completely fleshed out. I think that it's an interesting capability 
>> and in
>> > its most basic form, it is a simple variation of the current
>> > patchpoint/stackmap intrinsics.
>> >
>> >
>> >
>> >
>> >
>> > On April 29, 2014 at 10:09:49 AM, Philip Reames 
>> (listmail at philipreames.com)
>> > wrote:
>> >
>> > As the discussion has progressed and I've spent more time thinking 
>> about the
>> > topic, I find myself less and less enthused about the current 
>> proposal. I'm
>> > in full support of having idiomatic ways to express safe division, 
>> but I'm
>> > starting to doubt that using an intrinsic is the right way at the 
>> moment.
>> >
>> > One case I find myself thinking about is how one would combine 
>> profiling
>> > information and implicit div-by-zero/overflow checks with this 
>> proposal. I
>> > don't really see a clean way. Ideally, for a "safe div" which never 
>> has the
>> > exceptional paths taken, you'd like to completely do away with the 
>> control
>> > flow entirely. (And rely on hardware traps w/exceptions instead.) I 
>> don't
>> > really see a way to represent that type of construct given the current
>> > proposal.
>> >
>> > This is a deeper problem and to solve it you'd need a solution to 
>> trapping
>> > in general. Let's consider the case of Java. A Java program may want to
>> > catch the arithmetic exception due to divide by zero. How would you 
>> do this
>> > with a trap in LLVM IR? Spill all state that is live at the catch? 
>> Use a
>> > patchpoint for the entire division instruction?
>> >
>> > We'd likely use something similar to a patchpoint. You'd need the 
>> "abstract
>> > vm state" (which is not the compiled frame necessarily) available 
>> at the div
>> > instruction. You could then re-enter the interpreter at the 
>> specified index
>> > (part of the vm state). We have all most of these mechanisms in place.
>> > Ideally, you'd trigger a recompile and otherwise ensure re-entry into
>> > compiled code at the soonest possible moment.
>> >
>> > This requires a lot of runtime support, but we already have most of it
>> > implemented for another compiler. From our perspective, the runtime
>> > requirements are not a major blocker.
>> >
>> > Right, you'll use a patchpoint. That's way more expensive than 
>> using a safe
>> > division intrinsic with branches, because it's opaque to the optimizer.
>> >
>> > This statement is true at the moment, but it shouldn't be. I think 
>> this is
>> > our fundamental difference in approach.
>> >
>> > You should be able to write something like:
>> > i32 %res = invoke patchpoint (... x86_trapping_divide, a, b) 
>> normal_dest
>> > invoke_dest
>> >
>> > normal_dest:
>> > ;; use %res
>> > invoke_dest:
>> > landingpad
>> > ;; dispatch edge cases
>> > ;; this could be unreachable code if you deopt this frame in the trap
>> > handler and jump directly to an interpreter or other bit of code
>> >
>> > I see. It sounds like you want a generalization of the 
>> "div.with.stackmap"
>> > that I thought you wanted - you want to be able to wrap anything in a
>> > stackmap.
>> >
>> > The current patchpoint intrinsic does not do this, and you run the 
>> risk of
>> > breaking existing semantics if you changed this. You'd probably break
>> > WebKit, which treats the call target of the patchpoint as nothing 
>> more than
>> > a quirk - we always pass null. Also, the current patchpoint treats the
>> > callee as an i8* if I remember right and it would be super weird if 
>> all LLVM
>> > phases had to interpret this i8* by unwrapping a possible bitcast 
>> to get to
>> > a declared function that may be an intrinsic. Yuck! Basically, the call
>> > target of existing patchpoints is meant to be a kind of convenience 
>> feature
>> > rather than the core of the mechanism.
>> >
>> > I agree in principle that the intrinsic that you want would be a useful
>> > intrinsic. But let's not call it a patchpoint for the purposes of this
>> > discussion, and let's not confuse the discussion by claiming 
>> (incorrectly)
>> > that the existing patchpoint facility gives you what you want. It 
>> doesn't:
>> > patchpoints are designed to make the call target opaque (hence the 
>> use of
>> > i8*) and there shouldn't be a correlation between what the 
>> patchpoint does
>> > at run-time and what the called function would have done. You could 
>> make
>> > the call target be null (like WebKit does) and the patchpoint 
>> should still
>> > mean "this code can do anything" because the expectation is that 
>> the client
>> > JIT will patch over it anyway.
>> >
>> > Also, "patchpoint" would probably not be the right term for the 
>> intrinsic
>> > that you want. I think that what you want is "call.with.stackmap". Or
>> > maybe "stackmap.wrapper". Or just "stackmap" - I'd be OK, in principle,
>> > with changing the name of the current "stackmap" intrinsic to 
>> something that
>> > reflects the fact that it's less of a stackmap than what you want.
>> >
>> > To summarize. A patchpoint's main purpose is that you can patch it with
>> > arbitrary code. The current "stackmap" means that you can patch it with
>> > arbitrary code and that patching may be destructive to a shadow of 
>> machine
>> > code bytes, so it's really just like patchpoints - we could change 
>> its name
>> > to "patchpoint.shadow" for example.
>> >
>> > If you were to propose such a stackmap intrinsic, then I think 
>> there could
>> > be some ways of doing it that wouldn't be too terrible. Basically 
>> you want
>> > something that is like a patchpoint in that it reports a stackmap 
>> via a side
>> > channel, but unlike patchpoints, it doesn't allow arbitrary patching -
>> > instead the optimizer should be allowed to assume that the code 
>> within the
>> > patchpoint will always do the same thing that the call target would 
>> have
>> > done. There are downsides to truly doing this. For example, to make
>> > division efficient with such an intrinsic, you'd have to do 
>> something that
>> > is somewhat worse than just recognizing intrinsics in the optimizer 
>> - you'd
>> > have to first recognize a call to your "stackmap wrapper" intrinsic 
>> and then
>> > observe that its call target argument is in turn another intrinsic. 
>> To me
>> > personally, that's kind of yucky, but I won't deny that it could be 
>> useful.
>> >
>> > As to the use of invoke: I don't believe that the use of invoke 
>> versus my
>> > suggested "branch on a trap predicate" idea are different in any truly
>> > meaningful way. I buy that either would work.
>> >
>> >
>> >
>> > A patchpoint should not require any excess spilling. If values are 
>> live in
>> > registers, that should be reflected in the stack map. (I do not know if
>> > this is the case for patchpoint at the moment or not.)
>> >
>> > Patchpoints do not require spilling.
>> >
>> > My point was that with existing patchpoints, you *either* use a 
>> patchpoint
>> > for the entire division which makes the division opaque to the 
>> optimizer -
>> > because a patchpoint means "this code can do anything" - *or* you could
>> > spill stuff to the stack prior to your trapping division intrinsic, 
>> since
>> > spilling is something that you could do as a workaround if you 
>> didn't have a
>> > patchpoint.
>> >
>> > The reason why I brought up spilling at all is that I suspect that 
>> spilling
>> > all state to the stack might be cheaper - for some systems - than 
>> turning
>> > the division into a patchpoint. Turning the division into a 
>> patchpoint is
>> > horrendously brutal - the patchpoint looks like it clobbers the 
>> heap (which
>> > a division doesn't do), has to execute (a division is an obvious DCE
>> > candidate), cannot be hoisted (hoisting divisions is awesome), etc. 
>> Perhaps
>> > most importantly, though, a patchpoint doesn't tell LLVM that 
>> you're *doing
>> > a division* - so all constant folding, strenght reduction, and 
>> algebraic
>> > reasoning flies out the window. On the other hand, spilling all 
>> state to
>> > the stack is an arguably sound and performant solution to a lot of VM
>> > problems. I've seen JVM implementations that ensure that there is 
>> always a
>> > copy of state on the stack at some critical points, just because it 
>> makes
>> > loads of stuff simpler (debugging, profiling, GC, and of course 
>> deopt). I
>> > personally prefer to stay away from such a strategy because it's 
>> not free.
>> >
>> > On the other hand, branches can be cheap. A branch on a divide is 
>> cheaper
>> > than not being able to optimize the divide.
>> >
>> >
>> >
>> > The Value called by a patchpoint should participate in optimization
>> > normally.
>> >
>> > I agree that you could have a different intrinsic that behaves like 
>> this.
>> >
>> > We really want the patchpoint part of the call to be supplemental. It
>> > should still be a call. It should be constant propagated, transformed,
>> > etc.. This is not the case currently. I've got a couple of off the wall
>> > ideas for improving the current status, but I'll agree this is a 
>> hardish
>> > problem.
>> >
>> > It should be legal to use a patchpoint in an invoke. It's an ABI 
>> issue of
>> > how the invoke path gets invoked. (i.e. side tables for the runtime to
>> > lookup, etc..) This is not possible today, and probably requires a fair
>> > amount of work. Some of it, I've already done and will be sharing 
>> shortly.
>> > Other parts, I haven't even thought about.
>> >
>> > Right, it's significantly more complex than either the existing 
>> patchpoints
>> > or Michael's proposed safe.div.
>> >
>> >
>> >
>> > If you didn't want to use the trapping semantics, you'd insert 
>> dedicated
>> > control flow _before_ the divide. This would allow normal 
>> optimization of
>> > the control flow.
>> >
>> > Notes:
>> > 1) This might require a new PATCHPOINT pseudo op in the backend. 
>> Haven't
>> > thought much about that yet.
>> > 2) I *think* your current intrinsic could be translated into 
>> something like
>> > this. (Leaving aside the question of where the deopt state comes 
>> from.) In
>> > fact, the more I look at this, the less difference I actually see 
>> between
>> > the approaches.
>> >
>> >
>> >
>> > In a lot of languages, a divide produces some result even in the 
>> exceptional
>> > case and this result requires effectively deoptimizing since the 
>> resut won't
>> > be the one you would have predicted (double instead of int, or BigInt
>> > instead of small int), which sort of means that if the CPU 
>> exception occurs
>> > you have to be able to reconstruct all state. A patchpoint could do 
>> this,
>> > and so could spilling all state to the stack before the divide - 
>> but both
>> > are very heavy hammers that are sure to be more expensive than just 
>> doing a
>> > branch.
>> >
>> > This isn't necessarily as expensive as you might believe. I'd recommend
>> > reading the Graal project papers on this topic.
>> >
>> > Whether deopt or branching is more profitable *in this case*, I 
>> can't easily
>> > say. I'm not yet to the point of being able to run that experiment. 
>> We can
>> > argue about what "should" be better all we want, but real 
>> performance data
>> > is the only way to truly know.
>> >
>> > My point may have been confusing. I know that deoptimization is 
>> cheap and
>> > WebKit uses it everywhere, including division corner cases, if 
>> profiling
>> > tells us that it's profitable to do so (which it does, in the 
>> common case).
>> > WebKit is a heavy user of deoptimization in general, so you don't 
>> need to
>> > convince me that it's worth it.
>> >
>> > Acknowledged.
>> >
>> > Note that I want *both* deopt *and* branching, because in this case, a
>> > branch is the fastest overall way of detecting when to deopt. In the
>> > future, I will want to implement the deopt in terms of branching, 
>> and when
>> > we do this, I believe that the most sound and performat approach 
>> would be
>> > using Michael's intrinsics. This is subtle and I'll try to explain 
>> why it's
>> > the case.
>> >
>> > The point is that you wouldn't want to do deoptimization by 
>> spilling state
>> > on the main path or by using a patchpoint for the main path of the 
>> division.
>> >
>> > This is the main point I disagree with. I don't believe that having a
>> > patchpoint on the main path should be any more expensive then the 
>> original
>> > call. (see above)
>> >
>> > The reason why the patchpoint is expensive is that if you use a 
>> patchpoint
>> > to implement a division then the optimizer won't be allowed to 
>> assume that
>> > it's a division, because the whole point of "patchpoint" is to tell the
>> > optimizer to piss off.
>> >
>> >
>> >
>> > Worth noting explicitly: I'm assuming that all of your deopt state 
>> would
>> > already be available for other purposes in nearby code. It's on the 
>> stack
>> > or in registers. I'm assuming that by adding the deopt point, you 
>> are not
>> > radically changing the set of computations which need to be done. 
>> If that's
>> > not the case, you should avoid deopt and instead just inline the 
>> slow paths
>> > with explicit checks.
>> >
>> > Yes, of course it is. That's not the issue.
>> >
>> >
>> >
>> > I'll note that given your assumptions about the cost of a 
>> patchpoint, the
>> > rest of your position makes a lot more sense. :) As I spelled out 
>> above, I
>> > believe this cost is not fundamental.
>> >
>> > You don't want the common path of executing the division to involve a
>> > patchpoint instruction, although using a patchpoint or stackmap to 
>> implement
>> > deoptimization on the failing path is great:
>> >
>> > Good: if (division would fail) { call @patchpoint(all of my state) 
>> } else {
>> > result = a / b }
>> >
>> > Given your cost assumptions, I'd agree.
>> >
>> > Not my cost assumptions. The reason why this is better is that the 
>> division
>> > is expressed in LLVM IR so that LLVM can do useful things to it - like
>> > eliminate it, for example.
>> >
>> >
>> > Bad: call @patchpoint(all of my state) // patch with a divide 
>> instruction -
>> > bad because the optimizer has no clue what you're doing and assumes 
>> the very
>> > worst
>> >
>> > Yuck. Agreed.
>> >
>> > To be clear, this is what you're proposing - except that you're 
>> assuming
>> > that LLVM will know that you've patched a division because you're 
>> expecting
>> > the call target to have semantic meaning. Or, rather, you're 
>> expecting that
>> > you can make the contents of the patchpoint be a division by having 
>> the call
>> > target be a division intrinsic. In the current implementation and 
>> as it is
>> > currently specified, the call target has no meaning and so you get 
>> the yuck
>> > that I'm showing.
>> >
>> >
>> > Worse: spill all state to the stack; call @trapping.div(a, b) // 
>> the spills
>> > will hurt you far more than a branch, so this should be avoided
>> >
>> > I'm assuming this is an explicit spill rather than simply recording 
>> a stack
>> > map *at the div*. If so, agreed.
>> >
>> > I suppose we could imagine a fourth option that involves a 
>> patchpoint to
>> > pick up the state and a trapping divide instrinsic. But a trapping 
>> divide
>> > intrinsic alone is not enough. Consider this:
>> >
>> > result = call @trapping.div(a, b); call @stackmap(all of my state)
>> >
>> > As soon as these are separate instructions, you have no guarantees 
>> that the
>> > state that the stackmap reports is sound for the point at which the div
>> > would trap.
>> >
>> > This is the closest to what I'd propose, except that the two calls 
>> would be
>> > merged into a single patchpoint. Isn't the entire point of a 
>> patchpoint to
>> > record the stack map for a call?
>> >
>> > No. It would be bad if that's what the documentation says. That's 
>> not at
>> > all how WebKit uses it or probably any IC client would use it.
>> >
>> > Patchpoints are designed to be inline assembly on steroids. They're 
>> there
>> > to allow the client JIT to tell LLVM to piss off.
>> >
>> > (Well, ignoring the actual patching part..) Why not write this as:
>> > patchpoint(..., trapping.div, a, b);
>> >
>> > Is there something I'm missing here?
>> >
>> > Just to note: I fully agree that the two call proposal is unsound 
>> and should
>> > be avoided.
>> >
>> > So, the division itself shouldn't be a trapping instruction and 
>> instead you
>> > want to detect the bad case with a branch.
>> >
>> > To be clear:
>> >
>> > - Whether you use deoptimization for division or anything else - 
>> like WebKit
>> > has done since before any of the Graal papers were written - is mostly
>> > unrelated to how you represent the division, unless you wanted to 
>> add a new
>> > intrinsic that is like a trapping-division-with-stackmap:
>> >
>> > result = call @trapping.div.with.stackmap(a, b, ... all of my state 
>> ...)
>> >
>> > Now, maybe you do want such an intrinsic, in which case you should 
>> propose
>> > it!
>> >
>> > Given what we already have with patchpoints, I don't think a merged
>> > intrinsic is necessary. (See above). I believe we have all the parts to
>> > build this solution, and that - in theory - they should compose neatly.
>> >
>> > p.s. The bits I was referring to was not deopt per se. It was 
>> particularly
>> > which set of deopt state you used for each deopt point. That's a bit of
>> > tangent for the rest of the discussion now though.
>> >
>> > The reason why I haven't proposed it is that I think that 
>> long-term, the
>> > currently proposed intrinsics are a better path to getting the trapping
>> > optimizations. See my previous mail, where I show how we could tell 
>> LLVM
>> > what the failing path is (which may have deoptimization code that 
>> uses a
>> > stackmap or whatever), what the trapping predicate is (it comes 
>> from the
>> > safe.div intrinsic), and the fact that trapping is wise (branch 
>> weights).
>> >
>> > - If you want to do the deoptimization with a trap, then your only 
>> choice
>> > currently is to use a patchpoint for the main path of the division. 
>> This
>> > will be slower than using a branch to an OSR exit basic block, because
>> > you're making the division itself opaque to the optimizer (bad) 
>> just to get
>> > rid of a branch (which was probably cheap to begin with).
>> >
>> > Hence, what you want to do - one way or another, regardless of 
>> whether this
>> > proposed intrinsic is added - is to branch on the corner case 
>> condition, and
>> > have the slow case of the branch go to a basic block that deoptimizes.
>> > Unless of course you have profiling that says that the case does happen
>> > often, in which case you can have that basic block handle the 
>> corner case
>> > inline without leaving optimized code (FWIW, we do have such paths 
>> in WebKit
>> > and they are useful).
>> >
>> > So the question for me is whether the branching involves explicit 
>> control
>> > flow or is hidden inside an intrinsic. I prefer for it to be within an
>> > intrinsic because it:
>> >
>> > - allows the optimizer to do more interesting things in the common 
>> cases,
>> > like hoisting the entire division.
>> >
>> > - will give us a clearer path for implementing trapping 
>> optimizations in the
>> > future.
>> >
>> > - is an immediate win on ARM.
>> >
>> > I'd be curious to hear what specific idea you have about how to 
>> implement
>> > trap-based deoptimization with your trapping division intrinsic for 
>> x86 -
>> > maybe it's different from the two "bad" idioms I showed above.
>> >
>> > I hope my explanation above helps. If not, ask, and I'll try to explain
>> > more clearly.
>> >
>> > I think I understand it. I think that the only issue is that:
>> >
>> > - Patchpoints currently don't do what you want.
>> >
>> > - If you made patchpoints do what you want then you'd break WebKit 
>> - not to
>> > mention anyone who wants to use them for inline caches.
>> >
>> > So it seems like you want a new intrinsic. You should officially 
>> propose
>> > this new intrinsic, particularly since the core semantic 
>> differences are not
>> > so great from what we have now. OTOH, if you truly believe that 
>> patchpoints
>> > should just be changed to your semantics in a way that does break 
>> WebKit,
>> > then that's probably also something you should get off your chest. ;-)
>> >
>> >
>> >
>> > One point just for clarity; I don't believe this effects the 
>> conclusions of
>> > our discussion so far. I'm also fairly sure that you (Filip) are 
>> aware of
>> > this, but want to spell it out for other readers.
>> >
>> > You seem to be assuming that compiled code needs to explicitly 
>> branch to a
>> > point where deopt state is known to exit a compiled frame.
>> >
>> > This is a slightly unclear characterization of my assumptions. Our 
>> JIT does
>> > deoptimization without explicit branches for many, many things. You 
>> should
>> > look at it some time, it's pretty fancy. ;-)
>> >
>> > Worth noting is that you can also exit a compiled frame on a trap 
>> (without
>> > an explicitly condition/branch!) if the deopt state is known at the 
>> point
>> > you take the trap. This "exit frame on trap" behavior shows up with 
>> null
>> > pointer exceptions as well. I'll note that both compilers in OpenJDK
>> > support some combination of "exit-on-trap" conditions for division 
>> and null
>> > dereferences. The two differ on exactly what strategies they use in 
>> each
>> > case though. :)
>> >
>> > Yeah, and I've also implemented VMs that do this - and I endorse 
>> the basic
>> > idea. I know what you want, and my only point is that the existing
>> > patchpoints only give you this if you're willing to make a huge 
>> compromise:
>> > namely, that you're willing to make the division (or heap load for 
>> the null
>> > case) completely opaque to the compiler to the point that GVN, 
>> LICM, SCCP,
>> > and all algebraic reasoning have to give up on optimizing it. The 
>> point of
>> > using LLVM is that it can optimize code. It can optimize branches and
>> > divisions pretty well. So, eliminating an explicit branch by 
>> replacing it
>> > with a construct that appears opaque to the optimizer is not a smart
>> > trade-off.
>> >
>> > You could add a new intrinsic that, like patchpoints, records the 
>> layout of
>> > state in a side-table, but that is used as a kind of wrapper for 
>> operations
>> > that LLVM understands. This may or may not be hairy - you seem to 
>> have sort
>> > of acknowledged that it's got some complexity and I've also pointed 
>> out some
>> > possible issues. If this is something that you want, you should 
>> propose it
>> > so that others know what you're talking about. One danger of how we're
>> > discussing this right now is that you're overloading patchpoints to 
>> mean the
>> > thing you want them to mean rather than what they actually mean, 
>> which makes
>> > it seem like we don't need Michael's intrinsics on the grounds that
>> > patchpoints already offer a solution. They don't already offer a 
>> solution
>> > precisely because patchpoints don't do what your intrinsics would do.
>> >
>> >
>> >
>> > I'm not really arguing that either scheme is "better" in all cases. I'm
>> > simply arguing that we should support both and allow optimization 
>> and tuning
>> > between them. As far as I can tell, you seem to be assuming that an
>> > explicit branch to known exit point is always superior.
>> >
>> >
>> > Ok, back to the topic at hand...
>> >
>> > With regards to the current proposal, I'm going to take a step 
>> back. You
>> > guys seem to have already looked in this in a fair amount of depth. 
>> I'm not
>> > necessarily convinced you've come to the best solution, but at some 
>> point,
>> > we need to make forward progress. What you have is clearly better than
>> > nothing.
>> >
>> > Please go ahead and submit your current approach. We can come back and
>> > revise later if we really need to.
>> >
>> > I do request the following changes:
>> > - Mark it clearly as experimental.
>> >
>> >
>> > - Either don't specify the value computed in the edge cases, or 
>> allow those
>> > values to be specified as constant arguments to the call. This allows
>> > efficient lowering to x86's div instruction if you want to make use 
>> of the
>> > trapping semantics.
>> >
>> > Once again: how would you use this to get trapping semantics without
>> > throwing all of LLVM's optimizations out the window, in the absence 
>> of the
>> > kind of patchpoint-like intrinsic that you want? I ask just to make 
>> sure
>> > that we're on the same page.
>> >
>> >
>> >
>> > Finally, as for performance data, which part of this do you want 
>> performance
>> > data for? I concede that I don't have performance data for using 
>> Michael's
>> > new intrinsic. Part of what the intrinsic accomplishes is it gives 
>> a less
>> > ugly way of doing something that is already possible with target 
>> intrinsics
>> > on ARM. I think it would be great if you could get those semantics 
>> - along
>> > with a known-good implementation - on other architectures as well.
>> >
>> > I would be very interested in seeing data comparing two schemes:
>> > - Explicit control flow emited by the frontend
>> > - The safe.div intrinsic emitted by the frontend, desugared in 
>> CodeGenPrep
>> >
>> > My strong suspicion is that each would preform well in some cases 
>> and not in
>> > others. At least on x86. Since the edge-checks are essentially free on
>> > ARM, the second scheme would probably be strictly superior there.
>> >
>> > I am NOT asking that we block submission on this data however.
>> >
>> > But this discussion has also involved suggestions that we should use
>> > trapping to implement deoptimization, and the main point of my 
>> message is to
>> > strongly argue against anything like this given the current state of
>> > trapping semantics and how patchpoints work. My point is that using 
>> traps
>> > for division corner cases would either be unsound (see the stackmap 
>> after
>> > the trap, above), or would require you to do things that are obviously
>> > inefficient. If you truly believe that the branch to detect 
>> division slow
>> > paths is more expensive than spilling all bytecode state to the 
>> stack or
>> > using a patchpoint for the division, then I could probably hack 
>> something up
>> > in WebKit to show you the performance implications. (Or you could do it
>> > yourself, the code is open source...)
>> >
>> > In a couple of months, I'll probably have the performance data to 
>> discuss
>> > this for real. When that happens, let's pick this up and continue the
>> > debate. Alternatively, if you want to chat this over more with a 
>> beer in
>> > hand at the social next week, let me know. In the meantime, let's 
>> not stall
>> > the current proposal any more.
>> >
>> > Philip
>> >
>> >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140502/573b7055/attachment.html>


More information about the llvm-dev mailing list