<div dir="ltr">Heh. I like his more, but I see where you're coming from as well - the easy way (for me at least) to look at the guard is as an implicit branch to a side exit that restores (or communicates) state back to the interpreter. The lowering will likely need to surface some amount of that control flow. Though I'm not quite sure how you want the intrinsic to perform in the face of optimization. What's legal to move across etc. There were some comments earlier in the thread, but getting it explicitly laid out in the next writeup would be good.<div><br></div><div>More thoughts:</div><div><br></div><div>It might make sense for the stackmap intrinsic to go away after this - I don't think it's clear though. I'm open to arguments either way.<div><br></div><div>That said, I didn't see a representation for the deopt bundles? I'm assuming they're normal operands to the IR instruction yes?<div><br></div><div>-eric</div><div><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Thu, Feb 18, 2016 at 12:30 PM Philip Reames <<a href="mailto:listmail@philipreames.com">listmail@philipreames.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
Sanjoy gave the long answer, let me give the short one. :)<br>
<br>
"deopt" argument bundles are used in the middle end, they are
lowered into a statepoint, and generate the existing stackmap
format. i.e. one builds on the other.</div><div text="#000000" bgcolor="#FFFFFF"><br>
<br>
<div>On 02/18/2016 11:43 AM, Eric
Christopher wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Sanjoy,
<div><br>
</div>
<div>A quick question here. With the bailing to the interpreter
support that you're envisioning ("deopt operand bundle"), it
appears to overlap quite a bit with the existing stack maps.
What's the story/interaction/etc here? I agree that a simpler
control flow is great when bailing to the interpreter - doing
it with phi nodes is a recipe for pain and long compile times.</div>
<div><br>
</div>
<div>Thanks!</div>
<div><br>
</div>
<div>-eric<br>
<br>
<div class="gmail_quote">
<div dir="ltr">On Tue, Feb 16, 2016 at 6:06 PM Sanjoy Das
via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This is
a proposal to add guard intrinsics to LLVM.<br>
<br>
Couple of glossary items: when I say "interpreter" I mean
"the most<br>
conservative tier in the compilation stack" which can be
an actual<br>
interpreter, a "splat compiler" or even a regular JIT that
doesn't<br>
make optimistic assumptions. By "bailing out to the
interpreter" I<br>
mean "side exit" as defined by Philip in<br>
<a href="http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/" rel="noreferrer" target="_blank">http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/</a><br>
<br>
<br>
# high level summary<br>
<br>
Guard intrinsics are intended to represent "if (!cond)
leave();" like<br>
control flow in a structured manner. This kind of control
flow is<br>
abundant in IR coming from safe languages due to things
like range<br>
checks and null checks (and overflow checks, in some
cases). From a<br>
high level, there are two distinct motivations for
introducing guard<br>
intrinsics:<br>
<br>
- To keep control flow as simple and "straight line"-like
as is<br>
reasonable<br>
<br>
- To allow certain kinds of "widening" transforms that
cannot be<br>
soundly done in an explicit "check-and-branch" control
flow<br>
representation<br>
<br>
## straw man specification<br>
<br>
Signature:<br>
<br>
```<br>
declare void @llvm.guard_on(i1 %predicate) ;; requires [
"deopt"(...) ]<br>
```<br>
<br>
Semantics:<br>
<br>
`@llvm.guard_on` bails out to the interpreter (i.e.
"deoptimizes" the<br>
currently execution function) if `%predicate` is `false`,
meaning that<br>
after `@llvm.guard_on(i1 %t)` has executed `%t` can be
assumed to be<br>
true. In this way, it is close to `@llvm.assume` or an
`assert`, with<br>
one very important difference -- `@llvm.guard_on(i1
<false>)` is well<br>
defined (and not UB). `@llvm.guard_on` on a false
predicate bails to<br>
the interpreter and that is always safe (but slow), and so<br>
`@llvm.guard_on(i1 false)` is basically a `noreturn` call
that<br>
unconditionally transitions the current compilation to the<br>
interpreter.<br>
<br>
Bailing out to the interpreter involves re-creating the
state of the<br>
interpreter frames as-if the compilee had been executing
in the<br>
interpreter all along. This state is represented and
maintained using<br>
a `"deopt"` operand bundle attached to the call to
`@llvm.guard_on`.<br>
The verifier will reject calls to `@llvm.guard_on` without
a `"deopt"`<br>
operand bundle. `@llvm.guard_on` cannot be `invoke`ed
(that would be<br>
meaningless anyway, since the method it would've "thrown"
into is<br>
about to go away).<br>
<br>
<br>
Example:<br>
<br>
```<br>
...<br>
%rangeCheck0 = icmp ult i32 %arg0, 10<br>
call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/*
deopt state 0 */) ]<br>
%rangeCheck1 = icmp ult i32 %arg0, 12<br>
call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/*
deopt state 1 */) ]<br>
...<br>
```<br>
<br>
<br>
# details: motivation & alternatives<br>
<br>
As specified in the high level summary, there are two key
motivations.<br>
<br>
The first, more obvious one is that we want the CFG to be
less<br>
complex, even if that involves "hiding" control flow in
guard<br>
intrinsics. I expect this to benefit both compile time
and<br>
within-a-block optimization.<br>
<br>
The second, somewhat less obvious motivation to use guard
intrinsics<br>
instead of explicit control flow is to allow check
widening.<br>
<br>
## check widening<br>
<br>
Consider:<br>
<br>
```<br>
...<br>
%rangeCheck0 = icmp ult i32 6, %len ;; for a[6]<br>
call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/*
deopt state 0 */) ]<br>
call void @printf("hello world")<br>
%rangeCheck1 = icmp ult i32 7, %len ;; for a[7]<br>
call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/*
deopt state 1 */) ]<br>
access a[6] and a[7]<br>
...<br>
```<br>
<br>
we'd like to optimize it to<br>
<br>
```<br>
...<br>
%rangeCheckWide = icmp ult i32 7, %len<br>
call void @llvm.guard_on(i1 %rangeCheckWide) [
"deopt"(/* deopt state 0 */) ]<br>
call void @printf("hello world")<br>
;; %rangeCheck1 = icmp ult i32 7, %len ;; not needed
anymore<br>
;; call void @llvm.guard_on(i1 %rangeCheck1) [
"deopt"(/* deopt state 1 */) ]<br>
access a[6] and a[7]<br>
...<br>
```<br>
<br>
This way we do a range check only on `7` -- if `7` is
within bounds,<br>
then we know `6` is too. This transform is sound only
because we know<br>
that the guard on `7 ult %len` will not simply throw an
exception if<br>
the said predicate fails, but will bail out to the
interpreter with<br>
the abstract state `/* deopt state 0 */`. In fact, if
`%len` is `7`,<br>
the pre-transform program is supposed to print `"hello
world"` and<br>
*then* throw an exception, and bailing out to the
interpreter with `/*<br>
deopt state 0 */` will do exactly that.<br>
<br>
In other words, we're allowed to do speculative and
aggressive<br>
transforms that make a guard fail that wouldn't have in
the initial<br>
program. This is okay because a failed guard only bails
to the<br>
interpreter, and the interpreter always Does The Right
Thing(TM). In<br>
fact, it is correct (though unwise) to replace every
guard's predicate<br>
with `false`.<br>
<br>
## the problem with check widening and explicit control
flow<br>
<br>
Widening is difficult to get right in an explicit
"check-and-branch"<br>
representation. For instance, the above example in a<br>
"check-and-branch" representation would be (in pseudo C,
and excluding<br>
the printf):<br>
<br>
```<br>
...<br>
if (!(6 < %len)) { call @side_exit() [ "deopt"(P) ];
unreachable; }<br>
if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ];
unreachable; }<br>
...<br>
```<br>
<br>
The following transform is invalid:<br>
<br>
```<br>
...<br>
if (!(7 < %len)) { call @side_exit() [ "deopt"(P) ];
unreachable; }<br>
if (!(true)) { call @side_exit() [ "deopt"(Q) ];
unreachable; }<br>
...<br>
```<br>
<br>
since we do not know if the first side exit had been
optimized under<br>
the assumption `!(6 < %len)` (via JumpThreading etc.).
E.g. the<br>
"original" IR could have been<br>
<br>
```<br>
...<br>
if (!(6 < %len)) { call @side_exit() [ "deopt"(!(6
< %len)) ]; unreachable; }<br>
if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ];
unreachable; }<br>
...<br>
```<br>
<br>
which got optimized to<br>
<br>
```<br>
...<br>
if (!(6 < %len)) { call @side_exit() [ "deopt"(true)
]; unreachable; }<br>
if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ];
unreachable; }<br>
...<br>
```<br>
<br>
before the widening transform. The widening transform will
now<br>
effectively pass in an incorrect value for `!(6 <
%len)`.<br>
<br>
This isn't to say it is impossible to do check widening in
a explicit<br>
control flow representation, just that is more natural to
do it with<br>
guards.<br>
<br>
<br>
# details: semantics<br>
<br>
## as-if control flow<br>
<br>
The observable behavior of `@llvm.guard_on` is specified
as:<br>
<br>
```<br>
void @llvm.guard_on(i1 %pred) {<br>
entry:<br>
%unknown_cond = < unknown source ><br>
%cond = and i1 %unknown_cond, %pred<br>
br i1 %cond, label %left, label %right<br>
<br>
left:<br>
call void @bail_to_interpreter() [ "deopt"() ]
noreturn<br>
unreachable<br>
<br>
right:<br>
ret void<br>
}<br>
```<br>
<br>
So, precisely speaking, `@llvm.guard_on` is guaranteed to
bail to the<br>
interpreter if `%pred` is false, but it **may** bail to
the<br>
interpreter if `%pred` is true. It is this bit that lets
us soundly<br>
widen `%pred`, since all we're doing is "refining" `<
unknown source >`.<br>
<br>
`@bail_to_interpreter` does not return to the current
compilation, but<br>
it returns to the `"deopt"` continuation that is has been
given (once<br>
inlined, the empty "deopt"() continuation will be fixed up
to have the right<br>
continuation).<br>
<br>
<br>
## applicable optimizations<br>
<br>
Apart from widening, most of the optimizations we're
interested in are<br>
what's allowed by an equivalent `@llvm.assume`. Any
conditional<br>
branches dominated by a guard on the same condition can be
folded,<br>
multiple guards on the same condition can be CSE'ed, loop
invariant<br>
guards can be hoisted out of loops etc.<br>
<br>
Ultimately, we'd like to recover the same quality of
optimization as<br>
we currently get from the "check-and-branch"
representation. With the<br>
"check-and-branch" representation, the optimizer is able
to sink<br>
stores and computation into the slow path. This is
something it cannot<br>
do in the guard_on representation, and we'll have to lower
the<br>
guard_on representation to the "check-and-branch"
representation at a<br>
suitable point in the pipeline to get this kind of
optimization.<br>
<br>
## lowering<br>
<br>
At some point in the compilation pipeline, we will have to
lower<br>
`@llvm.guard_on` into explicit control flow, by "inlining"
"an<br>
implementation" of `@llvm.guard_on` (or by some other
means). I don't<br>
have a good sense on when in the pipeline this should be
done -- the<br>
answer will depend on what we find as we make LLVM more
aggressive<br>
around optimizing guards.<br>
<br>
## environments without support for bailing to the
interpreter<br>
<br>
Our VM has deeply integrated support for deoptimizations,
but not all<br>
language runtimes do. If there is a need for non
deoptimizing guards, we<br>
can consider introducing a variant of `@llvm.guard_on`:<br>
<br>
```<br>
declare void @llvm.exception_on(i1 %predicate, i32
%exceptionKind)<br>
```<br>
<br>
with this one having the semantics that it always throws
an exception<br>
if `%predicate` fails. Only the non-widening
optimizations for<br>
`@llvm.guard_on` will apply to `@llvm.exception_on`.<br>
<br>
## memory effects (unresolved)<br>
<br>
[I haven't come up with a good model for the memory
effects of<br>
`@llvm.guard_on`, suggestions are very welcome.]<br>
<br>
I'd really like to model `@llvm.guard_on` as a readonly
function,<br>
since it does not write to memory if it returns; and e.g.
forwarding<br>
loads across a call to `@llvm.guard_on` should be legal.<br>
<br>
However, I'm in a quandary around representing the "may
never return"<br>
aspect of `@llvm.guard_on`: I have to make it illegal to,
say, hoist a<br>
load form `%ptr` across a guard on `%ptr != null`. There
are couple<br>
of ways I can think of dealing with this, none of them are
both easy<br>
and neat:<br>
<br>
- State that since `@llvm.guard_on` could have had an
infinite loop<br>
in it, it may never return. Unfortunately, the LLVM IR
has some<br>
rough edges on readonly infinite loops (since C++
disallows that),<br>
so LLVM will require some preparatory work before we
can do this<br>
soundly.<br>
<br>
- State that `@llvm.guard_on` can unwind, and thus has
non-local<br>
control flow. This can actually work (and is pretty
close to<br>
the actual semantics), but is somewhat of hack since<br>
`@llvm.guard_on` doesn't _really_ throw an exception.<br>
<br>
- Special case `@llvm.guard_on` (yuck!).<br>
<br>
What do you think?<br>
<br>
-- Sanjoy<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
</div></blockquote></div>