<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 05/02/2014 11:57 AM, Filip Pizlo
wrote:<br>
</div>
<blockquote cite="mid:etPan.5363ea85.57fc4fbb.172db@dethklok.local"
type="cite">
<style>body{font-family:Helvetica,Arial;font-size:13px}</style>
<div id="bloop_customfont"
style="font-family:Helvetica,Arial;font-size:13px; color:
rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><br>
</div>
<p style="color:#000;">On May 2, 2014 at 11:53:25 AM, Eric
Christopher (<a moz-do-not-send="true"
href="mailto:echristo@gmail.com">echristo@gmail.com</a>)
wrote:</p>
<div>
<blockquote type="cite" class="clean_bq" style="color: rgb(0, 0,
0); font-family: Helvetica, Arial; font-size: 13px;
font-style: normal; font-variant: normal; font-weight: normal;
letter-spacing: normal; line-height: normal; orphans: auto;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: auto; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255,
255, 255);"><span>
<div>
<div>On Wed, Apr 30, 2014 at 10:34 PM, Philip Reames<span
class="Apple-converted-space"> </span><br>
<a class="moz-txt-link-rfc2396E" href="mailto:listmail@philipreames.com"><listmail@philipreames.com></a> wrote:<span
class="Apple-converted-space"> </span><br>
> Andy - If you're not already following this
closely, please start. We've<span
class="Apple-converted-space"> </span><br>
> gotten into fairly fundamental questions of what a
patchpoint does.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Filip,<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I think you've hit the nail on the head. What I'm
thinking of as being<span class="Apple-converted-space"> </span><br>
> patchpoints are not what you think they are. Part
of that is that I've got<span
class="Apple-converted-space"> </span><br>
> a local change which adds a very similar
construction (called "statepoints"<span
class="Apple-converted-space"> </span><br>
> for the moment), but I was trying to keep that
separate. That also includes<span
class="Apple-converted-space"> </span><br>
> a lot of GC semantics which are not under
discussion currently. My<span
class="Apple-converted-space"> </span><br>
> apologies if that experience bled over into this
conversation and made<span class="Apple-converted-space"> </span><br>
> things more confusing.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I will note that the documentation for patchpoint
say explicitly the<span class="Apple-converted-space"> </span><br>
> following:<span class="Apple-converted-space"> </span><br>
> "The ‘llvm.experimental.patchpoint.*‘ intrinsics
creates a function call to<span
class="Apple-converted-space"> </span><br>
> the specified <target> and records the
location of specified values in the<span
class="Apple-converted-space"> </span><br>
> stack map."<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> My reading has always been that a patchpoint *that
isn't patched* is simply<span
class="Apple-converted-space"> </span><br>
> a call with a stackmap associated with it. To my
reading, this can (and<span
class="Apple-converted-space"> </span><br>
> did, and does) indicate my proposed usage would be
legal.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
<br>
I like the idea that the target can be assumed to be
called. It makes<span class="Apple-converted-space"> </span><br>
optimization of the call possible, etc. I think it's
definitely worth<span class="Apple-converted-space"> </span><br>
exploring before we lock down the patchpoint intrinsic.<span
class="Apple-converted-space"> </span></div>
</div>
</span></blockquote>
</div>
<p>I will actively oppose this.</p>
</blockquote>
I think it sounds like we need to split patchpoint into two cases.
I'm going to send a rough proposal for this later today.<br>
<blockquote cite="mid:etPan.5363ea85.57fc4fbb.172db@dethklok.local"
type="cite">
<p>-Filip</p>
<p><br>
</p>
<div>
<blockquote type="cite" class="clean_bq" style="color: rgb(0, 0,
0); font-family: Helvetica, Arial; font-size: 13px;
font-style: normal; font-variant: normal; font-weight: normal;
letter-spacing: normal; line-height: normal; orphans: auto;
text-align: start; text-indent: 0px; text-transform: none;
white-space: normal; widows: auto; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255,
255, 255);"><span>
<div>
<div><br>
<br>
-eric<span class="Apple-converted-space"> </span><br>
<br>
> I will agree that I've confused the topic badly on
the optimization front.<span
class="Apple-converted-space"> </span><br>
> My "statepoint" isn't patchable, so a lot more
optimizations are legal.<span
class="Apple-converted-space"> </span><br>
> Sorry about that. To restate what I think you've
been saying all along, the<span
class="Apple-converted-space"> </span><br>
> optimizer can't make assumptions about what
function is called by a<span
class="Apple-converted-space"> </span><br>
> patchpoint because that might change based on later
patching. Is this the<span class="Apple-converted-space"> </span><br>
> key point you've been trying to make?<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I'm not objecting to separating "my patchpoint"
from "your patchpoint".<span
class="Apple-converted-space"> </span><br>
> Let's just hammer out the semantics of each first.
:)<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Again, longer response to follow in a day or so. :)<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Philip<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> On 04/30/2014 10:09 PM, Filip Pizlo wrote:<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> On April 30, 2014 at 9:06:20 PM, Philip Reames
(<a class="moz-txt-link-abbreviated" href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)<span
class="Apple-converted-space"> </span><br>
> wrote:<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> On 04/29/2014 12:39 PM, Filip Pizlo wrote:<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> On April 29, 2014 at 11:27:06 AM, Philip Reames
(<a class="moz-txt-link-abbreviated" href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)<span
class="Apple-converted-space"> </span><br>
> wrote:<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> On 04/29/2014 10:44 AM, Filip Pizlo wrote:<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> LD;DR: Your desire to use trapping on x86 only
further convinces me that<span
class="Apple-converted-space"> </span><br>
> Michael's proposed intrinsics are the best way to
go.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I'm still not convinced, but am not going to
actively oppose it either. I'm<span
class="Apple-converted-space"> </span><br>
> leery of designing a solution with major
assumptions we don't have data to<span
class="Apple-converted-space"> </span><br>
> backup.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I worry your assumptions about deoptimization are
potentially unsound. But<span
class="Apple-converted-space"> </span><br>
> I don't have data to actually show this (yet).<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I *think* I may have been unclear about my
assumptions; in particular, my<span
class="Apple-converted-space"> </span><br>
> claims with respect to deoptimization are probably
more subtle than they<span class="Apple-converted-space"> </span><br>
> appeared. WebKit can use LLVM and it has divisions
and we do all possible<span
class="Apple-converted-space"> </span><br>
> deoptimization/profiling/etc tricks for it, so this
is grounded in<span class="Apple-converted-space"> </span><br>
> experience. Forgive me if the rest of this e-mail
contains a lecture on<span class="Apple-converted-space"> </span><br>
> things that are obvious - I'll try to err on the
side of clarity and<span class="Apple-converted-space"> </span><br>
> completeness since this discussion is sufficiently
dense that we run the<span class="Apple-converted-space"> </span><br>
> risk of talking cross-purposes unless some baseline
assumptions are<span class="Apple-converted-space"> </span><br>
> established.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I think we're using the same terminology, but with
slightly different sets<span
class="Apple-converted-space"> </span><br>
> of assumptions. I'll point this out below where
relevant.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Also, thanks for taking the time to expand. It help
clarify the discussion<span
class="Apple-converted-space"> </span><br>
> quite a bit.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I think we may be converging to an understanding of
what you want versus<span class="Apple-converted-space"> </span><br>
> what I want, and I think that there are some points
- possibly unrelated to<span
class="Apple-converted-space"> </span><br>
> division - that are worth clarifying. I think that
the main difference is<span
class="Apple-converted-space"> </span><br>
> that when I say "patchpoint", I am referring to a
concrete intrinsic with<span
class="Apple-converted-space"> </span><br>
> specific semantics that cannot change without
breaking WebKit, while you are<span
class="Apple-converted-space"> </span><br>
> using the term to refer to a broad concept, or
rather, a class of<span class="Apple-converted-space"> </span><br>
> as-yet-unimplemented intrinsics that share some of
the same features with<span
class="Apple-converted-space"> </span><br>
> patchpoints but otherwise have incompatible
semantics.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Also, when I say that you wouldn't want to use the
existing patchpoint to do<span
class="Apple-converted-space"> </span><br>
> your trapping deopt, what I mean is that the
performance of doing this would<span
class="Apple-converted-space"> </span><br>
> suck for reasons that are not related to
deoptimization or trapping. I'm<span
class="Apple-converted-space"> </span><br>
> not claiming that deoptimization performs poorly
(trust me, I know better)<span
class="Apple-converted-space"> </span><br>
> or that trapping to deoptimize is bad (I've done
this many, many times and I<span
class="Apple-converted-space"> </span><br>
> know better). I'm saying that with the current
patchpoint intrinsics in<span
class="Apple-converted-space"> </span><br>
> LLVM, as they are currently specified and
implemented, doing it would be a<span
class="Apple-converted-space"> </span><br>
> bad idea because you'd have to compromise a bunch
of other optimizations to<span
class="Apple-converted-space"> </span><br>
> achieve it.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> You have essentially described new intrinsics that
would make this less of a<span
class="Apple-converted-space"> </span><br>
> bad idea and I am interested in your intrinsics, so
I'll try to both respond<span
class="Apple-converted-space"> </span><br>
> with why patchpoints don't currently give you what
you want (and why simply<span
class="Apple-converted-space"> </span><br>
> changing patchpoint semantics would be evil) and
I'll also try to comment on<span
class="Apple-converted-space"> </span><br>
> what I think of the intrinsic that you're
effectively proposing. Long story<span
class="Apple-converted-space"> </span><br>
> short, I think you should formally propose your
intrinsic even if it's not<span
class="Apple-converted-space"> </span><br>
> completely fleshed out. I think that it's an
interesting capability and in<span
class="Apple-converted-space"> </span><br>
> its most basic form, it is a simple variation of
the current<span class="Apple-converted-space"> </span><br>
> patchpoint/stackmap intrinsics.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> On April 29, 2014 at 10:09:49 AM, Philip Reames
(<a class="moz-txt-link-abbreviated" href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>)<span
class="Apple-converted-space"> </span><br>
> wrote:<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> As the discussion has progressed and I've spent
more time thinking about the<span
class="Apple-converted-space"> </span><br>
> topic, I find myself less and less enthused about
the current proposal. I'm<span
class="Apple-converted-space"> </span><br>
> in full support of having idiomatic ways to express
safe division, but I'm<span
class="Apple-converted-space"> </span><br>
> starting to doubt that using an intrinsic is the
right way at the moment.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> One case I find myself thinking about is how one
would combine profiling<span
class="Apple-converted-space"> </span><br>
> information and implicit div-by-zero/overflow
checks with this proposal. I<span
class="Apple-converted-space"> </span><br>
> don't really see a clean way. Ideally, for a "safe
div" which never has the<span
class="Apple-converted-space"> </span><br>
> exceptional paths taken, you'd like to completely
do away with the control<span
class="Apple-converted-space"> </span><br>
> flow entirely. (And rely on hardware traps
w/exceptions instead.) I don't<span
class="Apple-converted-space"> </span><br>
> really see a way to represent that type of
construct given the current<span
class="Apple-converted-space"> </span><br>
> proposal.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> This is a deeper problem and to solve it you'd need
a solution to trapping<span
class="Apple-converted-space"> </span><br>
> in general. Let's consider the case of Java. A Java
program may want to<span class="Apple-converted-space"> </span><br>
> catch the arithmetic exception due to divide by
zero. How would you do this<span
class="Apple-converted-space"> </span><br>
> with a trap in LLVM IR? Spill all state that is
live at the catch? Use a<span
class="Apple-converted-space"> </span><br>
> patchpoint for the entire division instruction?<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> We'd likely use something similar to a patchpoint.
You'd need the "abstract<span
class="Apple-converted-space"> </span><br>
> vm state" (which is not the compiled frame
necessarily) available at the div<span
class="Apple-converted-space"> </span><br>
> instruction. You could then re-enter the
interpreter at the specified index<span
class="Apple-converted-space"> </span><br>
> (part of the vm state). We have all most of these
mechanisms in place.<span class="Apple-converted-space"> </span><br>
> Ideally, you'd trigger a recompile and otherwise
ensure re-entry into<span class="Apple-converted-space"> </span><br>
> compiled code at the soonest possible moment.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> This requires a lot of runtime support, but we
already have most of it<span
class="Apple-converted-space"> </span><br>
> implemented for another compiler. From our
perspective, the runtime<span
class="Apple-converted-space"> </span><br>
> requirements are not a major blocker.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Right, you'll use a patchpoint. That's way more
expensive than using a safe<span
class="Apple-converted-space"> </span><br>
> division intrinsic with branches, because it's
opaque to the optimizer.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> This statement is true at the moment, but it
shouldn't be. I think this is<span
class="Apple-converted-space"> </span><br>
> our fundamental difference in approach.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> You should be able to write something like:<span
class="Apple-converted-space"> </span><br>
> i32 %res = invoke patchpoint (...
x86_trapping_divide, a, b) normal_dest<span
class="Apple-converted-space"> </span><br>
> invoke_dest<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> normal_dest:<span class="Apple-converted-space"> </span><br>
> ;; use %res<span class="Apple-converted-space"> </span><br>
> invoke_dest:<span class="Apple-converted-space"> </span><br>
> landingpad<span class="Apple-converted-space"> </span><br>
> ;; dispatch edge cases<span
class="Apple-converted-space"> </span><br>
> ;; this could be unreachable code if you deopt this
frame in the trap<span class="Apple-converted-space"> </span><br>
> handler and jump directly to an interpreter or
other bit of code<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I see. It sounds like you want a generalization of
the "div.with.stackmap"<span
class="Apple-converted-space"> </span><br>
> that I thought you wanted - you want to be able to
wrap anything in a<span class="Apple-converted-space"> </span><br>
> stackmap.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> The current patchpoint intrinsic does not do this,
and you run the risk of<span
class="Apple-converted-space"> </span><br>
> breaking existing semantics if you changed this.
You'd probably break<span class="Apple-converted-space"> </span><br>
> WebKit, which treats the call target of the
patchpoint as nothing more than<span
class="Apple-converted-space"> </span><br>
> a quirk - we always pass null. Also, the current
patchpoint treats the<span class="Apple-converted-space"> </span><br>
> callee as an i8* if I remember right and it would
be super weird if all LLVM<span
class="Apple-converted-space"> </span><br>
> phases had to interpret this i8* by unwrapping a
possible bitcast to get to<span
class="Apple-converted-space"> </span><br>
> a declared function that may be an intrinsic. Yuck!
Basically, the call<span class="Apple-converted-space"> </span><br>
> target of existing patchpoints is meant to be a
kind of convenience feature<span
class="Apple-converted-space"> </span><br>
> rather than the core of the mechanism.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I agree in principle that the intrinsic that you
want would be a useful<span
class="Apple-converted-space"> </span><br>
> intrinsic. But let's not call it a patchpoint for
the purposes of this<span class="Apple-converted-space"> </span><br>
> discussion, and let's not confuse the discussion by
claiming (incorrectly)<span
class="Apple-converted-space"> </span><br>
> that the existing patchpoint facility gives you
what you want. It doesn't:<span
class="Apple-converted-space"> </span><br>
> patchpoints are designed to make the call target
opaque (hence the use of<span
class="Apple-converted-space"> </span><br>
> i8*) and there shouldn't be a correlation between
what the patchpoint does<span
class="Apple-converted-space"> </span><br>
> at run-time and what the called function would have
done. You could make<span class="Apple-converted-space"> </span><br>
> the call target be null (like WebKit does) and the
patchpoint should still<span
class="Apple-converted-space"> </span><br>
> mean "this code can do anything" because the
expectation is that the client<span
class="Apple-converted-space"> </span><br>
> JIT will patch over it anyway.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Also, "patchpoint" would probably not be the right
term for the intrinsic<span
class="Apple-converted-space"> </span><br>
> that you want. I think that what you want is
"call.with.stackmap". Or<span
class="Apple-converted-space"> </span><br>
> maybe "stackmap.wrapper". Or just "stackmap" - I'd
be OK, in principle,<span class="Apple-converted-space"> </span><br>
> with changing the name of the current "stackmap"
intrinsic to something that<span
class="Apple-converted-space"> </span><br>
> reflects the fact that it's less of a stackmap than
what you want.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> To summarize. A patchpoint's main purpose is that
you can patch it with<span class="Apple-converted-space"> </span><br>
> arbitrary code. The current "stackmap" means that
you can patch it with<span class="Apple-converted-space"> </span><br>
> arbitrary code and that patching may be destructive
to a shadow of machine<span
class="Apple-converted-space"> </span><br>
> code bytes, so it's really just like patchpoints -
we could change its name<span
class="Apple-converted-space"> </span><br>
> to "patchpoint.shadow" for example.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> If you were to propose such a stackmap intrinsic,
then I think there could<span
class="Apple-converted-space"> </span><br>
> be some ways of doing it that wouldn't be too
terrible. Basically you want<span
class="Apple-converted-space"> </span><br>
> something that is like a patchpoint in that it
reports a stackmap via a side<span
class="Apple-converted-space"> </span><br>
> channel, but unlike patchpoints, it doesn't allow
arbitrary patching -<span class="Apple-converted-space"> </span><br>
> instead the optimizer should be allowed to assume
that the code within the<span
class="Apple-converted-space"> </span><br>
> patchpoint will always do the same thing that the
call target would have<span
class="Apple-converted-space"> </span><br>
> done. There are downsides to truly doing this. For
example, to make<span class="Apple-converted-space"> </span><br>
> division efficient with such an intrinsic, you'd
have to do something that<span
class="Apple-converted-space"> </span><br>
> is somewhat worse than just recognizing intrinsics
in the optimizer - you'd<span
class="Apple-converted-space"> </span><br>
> have to first recognize a call to your "stackmap
wrapper" intrinsic and then<span
class="Apple-converted-space"> </span><br>
> observe that its call target argument is in turn
another intrinsic. To me<span
class="Apple-converted-space"> </span><br>
> personally, that's kind of yucky, but I won't deny
that it could be useful.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> As to the use of invoke: I don't believe that the
use of invoke versus my<span
class="Apple-converted-space"> </span><br>
> suggested "branch on a trap predicate" idea are
different in any truly<span
class="Apple-converted-space"> </span><br>
> meaningful way. I buy that either would work.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> A patchpoint should not require any excess
spilling. If values are live in<span
class="Apple-converted-space"> </span><br>
> registers, that should be reflected in the stack
map. (I do not know if<span
class="Apple-converted-space"> </span><br>
> this is the case for patchpoint at the moment or
not.)<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Patchpoints do not require spilling.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> My point was that with existing patchpoints, you
*either* use a patchpoint<span
class="Apple-converted-space"> </span><br>
> for the entire division which makes the division
opaque to the optimizer -<span
class="Apple-converted-space"> </span><br>
> because a patchpoint means "this code can do
anything" - *or* you could<span
class="Apple-converted-space"> </span><br>
> spill stuff to the stack prior to your trapping
division intrinsic, since<span
class="Apple-converted-space"> </span><br>
> spilling is something that you could do as a
workaround if you didn't have a<span
class="Apple-converted-space"> </span><br>
> patchpoint.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> The reason why I brought up spilling at all is that
I suspect that spilling<span
class="Apple-converted-space"> </span><br>
> all state to the stack might be cheaper - for some
systems - than turning<span
class="Apple-converted-space"> </span><br>
> the division into a patchpoint. Turning the
division into a patchpoint is<span
class="Apple-converted-space"> </span><br>
> horrendously brutal - the patchpoint looks like it
clobbers the heap (which<span
class="Apple-converted-space"> </span><br>
> a division doesn't do), has to execute (a division
is an obvious DCE<span class="Apple-converted-space"> </span><br>
> candidate), cannot be hoisted (hoisting divisions
is awesome), etc. Perhaps<span
class="Apple-converted-space"> </span><br>
> most importantly, though, a patchpoint doesn't tell
LLVM that you're *doing<span
class="Apple-converted-space"> </span><br>
> a division* - so all constant folding, strenght
reduction, and algebraic<span
class="Apple-converted-space"> </span><br>
> reasoning flies out the window. On the other hand,
spilling all state to<span class="Apple-converted-space"> </span><br>
> the stack is an arguably sound and performant
solution to a lot of VM<span
class="Apple-converted-space"> </span><br>
> problems. I've seen JVM implementations that ensure
that there is always a<span
class="Apple-converted-space"> </span><br>
> copy of state on the stack at some critical points,
just because it makes<span class="Apple-converted-space"> </span><br>
> loads of stuff simpler (debugging, profiling, GC,
and of course deopt). I<span
class="Apple-converted-space"> </span><br>
> personally prefer to stay away from such a strategy
because it's not free.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> On the other hand, branches can be cheap. A branch
on a divide is cheaper<span
class="Apple-converted-space"> </span><br>
> than not being able to optimize the divide.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> The Value called by a patchpoint should participate
in optimization<span class="Apple-converted-space"> </span><br>
> normally.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I agree that you could have a different intrinsic
that behaves like this.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> We really want the patchpoint part of the call to
be supplemental. It<span class="Apple-converted-space"> </span><br>
> should still be a call. It should be constant
propagated, transformed,<span
class="Apple-converted-space"> </span><br>
> etc.. This is not the case currently. I've got a
couple of off the wall<span
class="Apple-converted-space"> </span><br>
> ideas for improving the current status, but I'll
agree this is a hardish<span
class="Apple-converted-space"> </span><br>
> problem.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> It should be legal to use a patchpoint in an
invoke. It's an ABI issue of<span
class="Apple-converted-space"> </span><br>
> how the invoke path gets invoked. (i.e. side tables
for the runtime to<span class="Apple-converted-space"> </span><br>
> lookup, etc..) This is not possible today, and
probably requires a fair<span
class="Apple-converted-space"> </span><br>
> amount of work. Some of it, I've already done and
will be sharing shortly.<span
class="Apple-converted-space"> </span><br>
> Other parts, I haven't even thought about.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Right, it's significantly more complex than either
the existing patchpoints<span
class="Apple-converted-space"> </span><br>
> or Michael's proposed safe.div.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> If you didn't want to use the trapping semantics,
you'd insert dedicated<span
class="Apple-converted-space"> </span><br>
> control flow _before_ the divide. This would allow
normal optimization of<span
class="Apple-converted-space"> </span><br>
> the control flow.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Notes:<span class="Apple-converted-space"> </span><br>
> 1) This might require a new PATCHPOINT pseudo op in
the backend. Haven't<span class="Apple-converted-space"> </span><br>
> thought much about that yet.<span
class="Apple-converted-space"> </span><br>
> 2) I *think* your current intrinsic could be
translated into something like<span
class="Apple-converted-space"> </span><br>
> this. (Leaving aside the question of where the
deopt state comes from.) In<span
class="Apple-converted-space"> </span><br>
> fact, the more I look at this, the less difference
I actually see between<span
class="Apple-converted-space"> </span><br>
> the approaches.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> In a lot of languages, a divide produces some
result even in the exceptional<span
class="Apple-converted-space"> </span><br>
> case and this result requires effectively
deoptimizing since the resut won't<span
class="Apple-converted-space"> </span><br>
> be the one you would have predicted (double instead
of int, or BigInt<span class="Apple-converted-space"> </span><br>
> instead of small int), which sort of means that if
the CPU exception occurs<span
class="Apple-converted-space"> </span><br>
> you have to be able to reconstruct all state. A
patchpoint could do this,<span
class="Apple-converted-space"> </span><br>
> and so could spilling all state to the stack before
the divide - but both<span class="Apple-converted-space"> </span><br>
> are very heavy hammers that are sure to be more
expensive than just doing a<span
class="Apple-converted-space"> </span><br>
> branch.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> This isn't necessarily as expensive as you might
believe. I'd recommend<span
class="Apple-converted-space"> </span><br>
> reading the Graal project papers on this topic.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Whether deopt or branching is more profitable *in
this case*, I can't easily<span
class="Apple-converted-space"> </span><br>
> say. I'm not yet to the point of being able to run
that experiment. We can<span
class="Apple-converted-space"> </span><br>
> argue about what "should" be better all we want,
but real performance data<span
class="Apple-converted-space"> </span><br>
> is the only way to truly know.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> My point may have been confusing. I know that
deoptimization is cheap and<span
class="Apple-converted-space"> </span><br>
> WebKit uses it everywhere, including division
corner cases, if profiling<span
class="Apple-converted-space"> </span><br>
> tells us that it's profitable to do so (which it
does, in the common case).<span
class="Apple-converted-space"> </span><br>
> WebKit is a heavy user of deoptimization in
general, so you don't need to<span
class="Apple-converted-space"> </span><br>
> convince me that it's worth it.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Acknowledged.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Note that I want *both* deopt *and* branching,
because in this case, a<span
class="Apple-converted-space"> </span><br>
> branch is the fastest overall way of detecting when
to deopt. In the<span class="Apple-converted-space"> </span><br>
> future, I will want to implement the deopt in terms
of branching, and when<span
class="Apple-converted-space"> </span><br>
> we do this, I believe that the most sound and
performat approach would be<span
class="Apple-converted-space"> </span><br>
> using Michael's intrinsics. This is subtle and I'll
try to explain why it's<span
class="Apple-converted-space"> </span><br>
> the case.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> The point is that you wouldn't want to do
deoptimization by spilling state<span
class="Apple-converted-space"> </span><br>
> on the main path or by using a patchpoint for the
main path of the division.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> This is the main point I disagree with. I don't
believe that having a<span class="Apple-converted-space"> </span><br>
> patchpoint on the main path should be any more
expensive then the original<span
class="Apple-converted-space"> </span><br>
> call. (see above)<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> The reason why the patchpoint is expensive is that
if you use a patchpoint<span
class="Apple-converted-space"> </span><br>
> to implement a division then the optimizer won't be
allowed to assume that<span
class="Apple-converted-space"> </span><br>
> it's a division, because the whole point of
"patchpoint" is to tell the<span
class="Apple-converted-space"> </span><br>
> optimizer to piss off.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Worth noting explicitly: I'm assuming that all of
your deopt state would<span
class="Apple-converted-space"> </span><br>
> already be available for other purposes in nearby
code. It's on the stack<span
class="Apple-converted-space"> </span><br>
> or in registers. I'm assuming that by adding the
deopt point, you are not<span
class="Apple-converted-space"> </span><br>
> radically changing the set of computations which
need to be done. If that's<span
class="Apple-converted-space"> </span><br>
> not the case, you should avoid deopt and instead
just inline the slow paths<span
class="Apple-converted-space"> </span><br>
> with explicit checks.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Yes, of course it is. That's not the issue.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I'll note that given your assumptions about the
cost of a patchpoint, the<span
class="Apple-converted-space"> </span><br>
> rest of your position makes a lot more sense. :) As
I spelled out above, I<span
class="Apple-converted-space"> </span><br>
> believe this cost is not fundamental.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> You don't want the common path of executing the
division to involve a<span class="Apple-converted-space"> </span><br>
> patchpoint instruction, although using a patchpoint
or stackmap to implement<span
class="Apple-converted-space"> </span><br>
> deoptimization on the failing path is great:<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Good: if (division would fail) { call
@patchpoint(all of my state) } else {<span
class="Apple-converted-space"> </span><br>
> result = a / b }<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Given your cost assumptions, I'd agree.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Not my cost assumptions. The reason why this is
better is that the division<span
class="Apple-converted-space"> </span><br>
> is expressed in LLVM IR so that LLVM can do useful
things to it - like<span class="Apple-converted-space"> </span><br>
> eliminate it, for example.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Bad: call @patchpoint(all of my state) // patch
with a divide instruction -<span
class="Apple-converted-space"> </span><br>
> bad because the optimizer has no clue what you're
doing and assumes the very<span
class="Apple-converted-space"> </span><br>
> worst<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Yuck. Agreed.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> To be clear, this is what you're proposing - except
that you're assuming<span class="Apple-converted-space"> </span><br>
> that LLVM will know that you've patched a division
because you're expecting<span
class="Apple-converted-space"> </span><br>
> the call target to have semantic meaning. Or,
rather, you're expecting that<span
class="Apple-converted-space"> </span><br>
> you can make the contents of the patchpoint be a
division by having the call<span
class="Apple-converted-space"> </span><br>
> target be a division intrinsic. In the current
implementation and as it is<span
class="Apple-converted-space"> </span><br>
> currently specified, the call target has no meaning
and so you get the yuck<span
class="Apple-converted-space"> </span><br>
> that I'm showing.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Worse: spill all state to the stack; call
@trapping.div(a, b) // the spills<span
class="Apple-converted-space"> </span><br>
> will hurt you far more than a branch, so this
should be avoided<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I'm assuming this is an explicit spill rather than
simply recording a stack<span
class="Apple-converted-space"> </span><br>
> map *at the div*. If so, agreed.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I suppose we could imagine a fourth option that
involves a patchpoint to<span
class="Apple-converted-space"> </span><br>
> pick up the state and a trapping divide instrinsic.
But a trapping divide<span class="Apple-converted-space"> </span><br>
> intrinsic alone is not enough. Consider this:<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> result = call @trapping.div(a, b); call
@stackmap(all of my state)<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> As soon as these are separate instructions, you
have no guarantees that the<span
class="Apple-converted-space"> </span><br>
> state that the stackmap reports is sound for the
point at which the div<span
class="Apple-converted-space"> </span><br>
> would trap.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> This is the closest to what I'd propose, except
that the two calls would be<span
class="Apple-converted-space"> </span><br>
> merged into a single patchpoint. Isn't the entire
point of a patchpoint to<span
class="Apple-converted-space"> </span><br>
> record the stack map for a call?<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> No. It would be bad if that's what the
documentation says. That's not at<span
class="Apple-converted-space"> </span><br>
> all how WebKit uses it or probably any IC client
would use it.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Patchpoints are designed to be inline assembly on
steroids. They're there<span
class="Apple-converted-space"> </span><br>
> to allow the client JIT to tell LLVM to piss off.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> (Well, ignoring the actual patching part..) Why not
write this as:<span class="Apple-converted-space"> </span><br>
> patchpoint(..., trapping.div, a, b);<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Is there something I'm missing here?<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Just to note: I fully agree that the two call
proposal is unsound and should<span
class="Apple-converted-space"> </span><br>
> be avoided.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> So, the division itself shouldn't be a trapping
instruction and instead you<span
class="Apple-converted-space"> </span><br>
> want to detect the bad case with a branch.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> To be clear:<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> - Whether you use deoptimization for division or
anything else - like WebKit<span
class="Apple-converted-space"> </span><br>
> has done since before any of the Graal papers were
written - is mostly<span class="Apple-converted-space"> </span><br>
> unrelated to how you represent the division, unless
you wanted to add a new<span
class="Apple-converted-space"> </span><br>
> intrinsic that is like a
trapping-division-with-stackmap:<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> result = call @trapping.div.with.stackmap(a, b, ...
all of my state ...)<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Now, maybe you do want such an intrinsic, in which
case you should propose<span
class="Apple-converted-space"> </span><br>
> it!<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Given what we already have with patchpoints, I
don't think a merged<span class="Apple-converted-space"> </span><br>
> intrinsic is necessary. (See above). I believe we
have all the parts to<span class="Apple-converted-space"> </span><br>
> build this solution, and that - in theory - they
should compose neatly.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> p.s. The bits I was referring to was not deopt per
se. It was particularly<span
class="Apple-converted-space"> </span><br>
> which set of deopt state you used for each deopt
point. That's a bit of<span
class="Apple-converted-space"> </span><br>
> tangent for the rest of the discussion now though.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> The reason why I haven't proposed it is that I
think that long-term, the<span
class="Apple-converted-space"> </span><br>
> currently proposed intrinsics are a better path to
getting the trapping<span class="Apple-converted-space"> </span><br>
> optimizations. See my previous mail, where I show
how we could tell LLVM<span
class="Apple-converted-space"> </span><br>
> what the failing path is (which may have
deoptimization code that uses a<span
class="Apple-converted-space"> </span><br>
> stackmap or whatever), what the trapping predicate
is (it comes from the<span class="Apple-converted-space"> </span><br>
> safe.div intrinsic), and the fact that trapping is
wise (branch weights).<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> - If you want to do the deoptimization with a trap,
then your only choice<span class="Apple-converted-space"> </span><br>
> currently is to use a patchpoint for the main path
of the division. This<span class="Apple-converted-space"> </span><br>
> will be slower than using a branch to an OSR exit
basic block, because<span class="Apple-converted-space"> </span><br>
> you're making the division itself opaque to the
optimizer (bad) just to get<span
class="Apple-converted-space"> </span><br>
> rid of a branch (which was probably cheap to begin
with).<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Hence, what you want to do - one way or another,
regardless of whether this<span
class="Apple-converted-space"> </span><br>
> proposed intrinsic is added - is to branch on the
corner case condition, and<span
class="Apple-converted-space"> </span><br>
> have the slow case of the branch go to a basic
block that deoptimizes.<span
class="Apple-converted-space"> </span><br>
> Unless of course you have profiling that says that
the case does happen<span class="Apple-converted-space"> </span><br>
> often, in which case you can have that basic block
handle the corner case<span
class="Apple-converted-space"> </span><br>
> inline without leaving optimized code (FWIW, we do
have such paths in WebKit<span
class="Apple-converted-space"> </span><br>
> and they are useful).<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> So the question for me is whether the branching
involves explicit control<span
class="Apple-converted-space"> </span><br>
> flow or is hidden inside an intrinsic. I prefer for
it to be within an<span class="Apple-converted-space"> </span><br>
> intrinsic because it:<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> - allows the optimizer to do more interesting
things in the common cases,<span
class="Apple-converted-space"> </span><br>
> like hoisting the entire division.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> - will give us a clearer path for implementing
trapping optimizations in the<span
class="Apple-converted-space"> </span><br>
> future.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> - is an immediate win on ARM.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I'd be curious to hear what specific idea you have
about how to implement<span
class="Apple-converted-space"> </span><br>
> trap-based deoptimization with your trapping
division intrinsic for x86 -<span
class="Apple-converted-space"> </span><br>
> maybe it's different from the two "bad" idioms I
showed above.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I hope my explanation above helps. If not, ask, and
I'll try to explain<span class="Apple-converted-space"> </span><br>
> more clearly.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I think I understand it. I think that the only
issue is that:<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> - Patchpoints currently don't do what you want.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> - If you made patchpoints do what you want then
you'd break WebKit - not to<span
class="Apple-converted-space"> </span><br>
> mention anyone who wants to use them for inline
caches.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> So it seems like you want a new intrinsic. You
should officially propose<span
class="Apple-converted-space"> </span><br>
> this new intrinsic, particularly since the core
semantic differences are not<span
class="Apple-converted-space"> </span><br>
> so great from what we have now. OTOH, if you truly
believe that patchpoints<span
class="Apple-converted-space"> </span><br>
> should just be changed to your semantics in a way
that does break WebKit,<span
class="Apple-converted-space"> </span><br>
> then that's probably also something you should get
off your chest. ;-)<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> One point just for clarity; I don't believe this
effects the conclusions of<span
class="Apple-converted-space"> </span><br>
> our discussion so far. I'm also fairly sure that
you (Filip) are aware of<span
class="Apple-converted-space"> </span><br>
> this, but want to spell it out for other readers.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> You seem to be assuming that compiled code needs to
explicitly branch to a<span
class="Apple-converted-space"> </span><br>
> point where deopt state is known to exit a compiled
frame.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> This is a slightly unclear characterization of my
assumptions. Our JIT does<span
class="Apple-converted-space"> </span><br>
> deoptimization without explicit branches for many,
many things. You should<span
class="Apple-converted-space"> </span><br>
> look at it some time, it's pretty fancy. ;-)<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Worth noting is that you can also exit a compiled
frame on a trap (without<span
class="Apple-converted-space"> </span><br>
> an explicitly condition/branch!) if the deopt state
is known at the point<span class="Apple-converted-space"> </span><br>
> you take the trap. This "exit frame on trap"
behavior shows up with null<span
class="Apple-converted-space"> </span><br>
> pointer exceptions as well. I'll note that both
compilers in OpenJDK<span class="Apple-converted-space"> </span><br>
> support some combination of "exit-on-trap"
conditions for division and null<span
class="Apple-converted-space"> </span><br>
> dereferences. The two differ on exactly what
strategies they use in each<span
class="Apple-converted-space"> </span><br>
> case though. :)<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Yeah, and I've also implemented VMs that do this -
and I endorse the basic<span
class="Apple-converted-space"> </span><br>
> idea. I know what you want, and my only point is
that the existing<span class="Apple-converted-space"> </span><br>
> patchpoints only give you this if you're willing to
make a huge compromise:<span
class="Apple-converted-space"> </span><br>
> namely, that you're willing to make the division
(or heap load for the null<span
class="Apple-converted-space"> </span><br>
> case) completely opaque to the compiler to the
point that GVN, LICM, SCCP,<span
class="Apple-converted-space"> </span><br>
> and all algebraic reasoning have to give up on
optimizing it. The point of<span
class="Apple-converted-space"> </span><br>
> using LLVM is that it can optimize code. It can
optimize branches and<span class="Apple-converted-space"> </span><br>
> divisions pretty well. So, eliminating an explicit
branch by replacing it<span
class="Apple-converted-space"> </span><br>
> with a construct that appears opaque to the
optimizer is not a smart<span
class="Apple-converted-space"> </span><br>
> trade-off.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> You could add a new intrinsic that, like
patchpoints, records the layout of<span
class="Apple-converted-space"> </span><br>
> state in a side-table, but that is used as a kind
of wrapper for operations<span
class="Apple-converted-space"> </span><br>
> that LLVM understands. This may or may not be hairy
- you seem to have sort<span
class="Apple-converted-space"> </span><br>
> of acknowledged that it's got some complexity and
I've also pointed out some<span
class="Apple-converted-space"> </span><br>
> possible issues. If this is something that you
want, you should propose it<span
class="Apple-converted-space"> </span><br>
> so that others know what you're talking about. One
danger of how we're<span class="Apple-converted-space"> </span><br>
> discussing this right now is that you're
overloading patchpoints to mean the<span
class="Apple-converted-space"> </span><br>
> thing you want them to mean rather than what they
actually mean, which makes<span
class="Apple-converted-space"> </span><br>
> it seem like we don't need Michael's intrinsics on
the grounds that<span class="Apple-converted-space"> </span><br>
> patchpoints already offer a solution. They don't
already offer a solution<span
class="Apple-converted-space"> </span><br>
> precisely because patchpoints don't do what your
intrinsics would do.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I'm not really arguing that either scheme is
"better" in all cases. I'm<span
class="Apple-converted-space"> </span><br>
> simply arguing that we should support both and
allow optimization and tuning<span
class="Apple-converted-space"> </span><br>
> between them. As far as I can tell, you seem to be
assuming that an<span class="Apple-converted-space"> </span><br>
> explicit branch to known exit point is always
superior.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Ok, back to the topic at hand...<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> With regards to the current proposal, I'm going to
take a step back. You<span class="Apple-converted-space"> </span><br>
> guys seem to have already looked in this in a fair
amount of depth. I'm not<span
class="Apple-converted-space"> </span><br>
> necessarily convinced you've come to the best
solution, but at some point,<span
class="Apple-converted-space"> </span><br>
> we need to make forward progress. What you have is
clearly better than<span class="Apple-converted-space"> </span><br>
> nothing.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Please go ahead and submit your current approach.
We can come back and<span class="Apple-converted-space"> </span><br>
> revise later if we really need to.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I do request the following changes:<span
class="Apple-converted-space"> </span><br>
> - Mark it clearly as experimental.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> - Either don't specify the value computed in the
edge cases, or allow those<span
class="Apple-converted-space"> </span><br>
> values to be specified as constant arguments to the
call. This allows<span class="Apple-converted-space"> </span><br>
> efficient lowering to x86's div instruction if you
want to make use of the<span
class="Apple-converted-space"> </span><br>
> trapping semantics.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Once again: how would you use this to get trapping
semantics without<span class="Apple-converted-space"> </span><br>
> throwing all of LLVM's optimizations out the
window, in the absence of the<span
class="Apple-converted-space"> </span><br>
> kind of patchpoint-like intrinsic that you want? I
ask just to make sure<span class="Apple-converted-space"> </span><br>
> that we're on the same page.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Finally, as for performance data, which part of
this do you want performance<span
class="Apple-converted-space"> </span><br>
> data for? I concede that I don't have performance
data for using Michael's<span
class="Apple-converted-space"> </span><br>
> new intrinsic. Part of what the intrinsic
accomplishes is it gives a less<span
class="Apple-converted-space"> </span><br>
> ugly way of doing something that is already
possible with target intrinsics<span
class="Apple-converted-space"> </span><br>
> on ARM. I think it would be great if you could get
those semantics - along<span
class="Apple-converted-space"> </span><br>
> with a known-good implementation - on other
architectures as well.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I would be very interested in seeing data comparing
two schemes:<span class="Apple-converted-space"> </span><br>
> - Explicit control flow emited by the frontend<span
class="Apple-converted-space"> </span><br>
> - The safe.div intrinsic emitted by the frontend,
desugared in CodeGenPrep<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> My strong suspicion is that each would preform well
in some cases and not in<span
class="Apple-converted-space"> </span><br>
> others. At least on x86. Since the edge-checks are
essentially free on<span class="Apple-converted-space"> </span><br>
> ARM, the second scheme would probably be strictly
superior there.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> I am NOT asking that we block submission on this
data however.<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> But this discussion has also involved suggestions
that we should use<span class="Apple-converted-space"> </span><br>
> trapping to implement deoptimization, and the main
point of my message is to<span
class="Apple-converted-space"> </span><br>
> strongly argue against anything like this given the
current state of<span class="Apple-converted-space"> </span><br>
> trapping semantics and how patchpoints work. My
point is that using traps<span
class="Apple-converted-space"> </span><br>
> for division corner cases would either be unsound
(see the stackmap after<span
class="Apple-converted-space"> </span><br>
> the trap, above), or would require you to do things
that are obviously<span class="Apple-converted-space"> </span><br>
> inefficient. If you truly believe that the branch
to detect division slow<span
class="Apple-converted-space"> </span><br>
> paths is more expensive than spilling all bytecode
state to the stack or<span class="Apple-converted-space"> </span><br>
> using a patchpoint for the division, then I could
probably hack something up<span
class="Apple-converted-space"> </span><br>
> in WebKit to show you the performance implications.
(Or you could do it<span class="Apple-converted-space"> </span><br>
> yourself, the code is open source...)<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> In a couple of months, I'll probably have the
performance data to discuss<span
class="Apple-converted-space"> </span><br>
> this for real. When that happens, let's pick this
up and continue the<span class="Apple-converted-space"> </span><br>
> debate. Alternatively, if you want to chat this
over more with a beer in<span
class="Apple-converted-space"> </span><br>
> hand at the social next week, let me know. In the
meantime, let's not stall<span
class="Apple-converted-space"> </span><br>
> the current proposal any more.<span
class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
> Philip<span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
><span class="Apple-converted-space"> </span><br>
</div>
</div>
</span></blockquote>
</div>
</blockquote>
<br>
</body>
</html>