[llvm-dev] RFC: Add guard intrinsics to LLVM

Thu Feb 18 11:43:10 PST 2016

Hi Sanjoy,

A quick question here. With the bailing to the interpreter support that
you're envisioning ("deopt operand bundle"), it appears to overlap quite a
bit with the existing stack maps. What's the story/interaction/etc here? I
agree that a simpler control flow is great when bailing to the interpreter
- doing it with phi nodes is a recipe for pain and long compile times.

Thanks!

-eric

On Tue, Feb 16, 2016 at 6:06 PM Sanjoy Das via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> This is a proposal to add guard intrinsics to LLVM.
>
> Couple of glossary items: when I say "interpreter" I mean "the most
> conservative tier in the compilation stack" which can be an actual
> interpreter, a "splat compiler" or even a regular JIT that doesn't
> make optimistic assumptions.  By "bailing out to the interpreter" I
> mean "side exit" as defined by Philip in
> http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/
>
>
> # high level summary
>
> Guard intrinsics are intended to represent "if (!cond) leave();" like
> control flow in a structured manner.  This kind of control flow is
> abundant in IR coming from safe languages due to things like range
> checks and null checks (and overflow checks, in some cases).  From a
> high level, there are two distinct motivations for introducing guard
> intrinsics:
>
>  - To keep control flow as simple and "straight line"-like as is
>    reasonable
>
>  - To allow certain kinds of "widening" transforms that cannot be
>    soundly done in an explicit "check-and-branch" control flow
>    representation
>
> ## straw man specification
>
> Signature:
>
> ```
> declare void @llvm.guard_on(i1 %predicate) ;; requires [ "deopt"(...) ]
> ```
>
> Semantics:
>
> `@llvm.guard_on` bails out to the interpreter (i.e. "deoptimizes" the
> currently execution function) if `%predicate` is `false`, meaning that
> after `@llvm.guard_on(i1 %t)` has executed `%t` can be assumed to be
> true.  In this way, it is close to `@llvm.assume` or an `assert`, with
> one very important difference -- `@llvm.guard_on(i1 <false>)` is well
> defined (and not UB).  `@llvm.guard_on` on a false predicate bails to
> the interpreter and that is always safe (but slow), and so
> `@llvm.guard_on(i1 false)` is basically a `noreturn` call that
> unconditionally transitions the current compilation to the
> interpreter.
>
> Bailing out to the interpreter involves re-creating the state of the
> interpreter frames as-if the compilee had been executing in the
> interpreter all along.  This state is represented and maintained using
> a `"deopt"` operand bundle attached to the call to `@llvm.guard_on`.
> The verifier will reject calls to `@llvm.guard_on` without a `"deopt"`
> operand bundle.  `@llvm.guard_on` cannot be `invoke`ed (that would be
> meaningless anyway, since the method it would've "thrown" into is
> about to go away).
>
>
> Example:
>
> ```
>   ...
>   %rangeCheck0 = icmp ult i32 %arg0, 10
>   call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt state 0 */)
> ]
>   %rangeCheck1 = icmp ult i32 %arg0, 12
>   call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */)
> ]
>   ...
> ```
>
>
> # details: motivation & alternatives
>
> As specified in the high level summary, there are two key motivations.
>
> The first, more obvious one is that we want the CFG to be less
> complex, even if that involves "hiding" control flow in guard
> intrinsics.  I expect this to benefit both compile time and
> within-a-block optimization.
>
> The second, somewhat less obvious motivation to use guard intrinsics
> instead of explicit control flow is to allow check widening.
>
> ## check widening
>
> Consider:
>
> ```
>   ...
>   %rangeCheck0 = icmp ult i32 6, %len   ;; for a[6]
>   call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt state 0 */)
> ]
>   call void @printf("hello world")
>   %rangeCheck1 = icmp ult i32 7, %len   ;; for a[7]
>   call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */)
> ]
>   access a[6] and a[7]
>   ...
> ```
>
> we'd like to optimize it to
>
> ```
>   ...
>   %rangeCheckWide = icmp ult i32 7, %len
>   call void @llvm.guard_on(i1 %rangeCheckWide) [ "deopt"(/* deopt state 0
> */) ]
>   call void @printf("hello world")
>   ;; %rangeCheck1 = icmp ult i32 7, %len  ;; not needed anymore
>   ;; call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1
> */) ]
>   access a[6] and a[7]
>   ...
> ```
>
> This way we do a range check only on `7` -- if `7` is within bounds,
> then we know `6` is too.  This transform is sound only because we know
> that the guard on `7 ult %len` will not simply throw an exception if
> the said predicate fails, but will bail out to the interpreter with
> the abstract state `/* deopt state 0 */`.  In fact, if `%len` is `7`,
> the pre-transform program is supposed to print `"hello world"` and
> *then* throw an exception, and bailing out to the interpreter with `/*
> deopt state 0 */` will do exactly that.
>
> In other words, we're allowed to do speculative and aggressive
> transforms that make a guard fail that wouldn't have in the initial
> program.  This is okay because a failed guard only bails to the
> interpreter, and the interpreter always Does The Right Thing(TM).  In
> fact, it is correct (though unwise) to replace every guard's predicate
> with `false`.
>
> ## the problem with check widening and explicit control flow
>
> Widening is difficult to get right in an explicit "check-and-branch"
> representation.  For instance, the above example in a
> "check-and-branch" representation would be (in pseudo C, and excluding
> the printf):
>
> ```
>   ...
>   if (!(6 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable; }
>   if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; }
>   ...
> ```
>
> The following transform is invalid:
>
> ```
>   ...
>   if (!(7 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable; }
>   if (!(true)) { call @side_exit() [ "deopt"(Q) ]; unreachable; }
>   ...
> ```
>
> since we do not know if the first side exit had been optimized under
> the assumption `!(6 < %len)` (via JumpThreading etc.).  E.g. the
> "original" IR could have been
>
> ```
>   ...
>   if (!(6 < %len)) { call @side_exit() [ "deopt"(!(6 < %len)) ];
> unreachable; }
>   if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; }
>   ...
> ```
>
> which got optimized to
>
> ```
>   ...
>   if (!(6 < %len)) { call @side_exit() [ "deopt"(true) ]; unreachable; }
>   if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; }
>   ...
> ```
>
> before the widening transform. The widening transform will now
> effectively pass in an incorrect value for `!(6 < %len)`.
>
> This isn't to say it is impossible to do check widening in a explicit
> control flow representation, just that is more natural to do it with
> guards.
>
>
> # details: semantics
>
> ## as-if control flow
>
> The observable behavior of `@llvm.guard_on` is specified as:
>
> ```
>   void @llvm.guard_on(i1 %pred) {
>   entry:
>     %unknown_cond = < unknown source >
>     %cond = and i1 %unknown_cond, %pred
>     br i1 %cond, label %left, label %right
>
>   left:
>     call void @bail_to_interpreter() [ "deopt"() ] noreturn
>     unreachable
>
>   right:
>     ret void
>   }
> ```
>
> So, precisely speaking, `@llvm.guard_on` is guaranteed to bail to the
> interpreter if `%pred` is false, but it **may** bail to the
> interpreter if `%pred` is true.  It is this bit that lets us soundly
> widen `%pred`, since all we're doing is "refining" `< unknown source >`.
>
> `@bail_to_interpreter` does not return to the current compilation, but
> it returns to the `"deopt"` continuation that is has been given (once
> inlined, the empty "deopt"() continuation will be fixed up to have the
> right
> continuation).
>
>
> ## applicable optimizations
>
> Apart from widening, most of the optimizations we're interested in are
> what's allowed by an equivalent `@llvm.assume`.  Any conditional
> branches dominated by a guard on the same condition can be folded,
> multiple guards on the same condition can be CSE'ed, loop invariant
> guards can be hoisted out of loops etc.
>
> Ultimately, we'd like to recover the same quality of optimization as
> we currently get from the "check-and-branch" representation.  With the
> "check-and-branch" representation, the optimizer is able to sink
> stores and computation into the slow path. This is something it cannot
> do in the guard_on representation, and we'll have to lower the
> guard_on representation to the "check-and-branch" representation at a
> suitable point in the pipeline to get this kind of optimization.
>
> ## lowering
>
> At some point in the compilation pipeline, we will have to lower
> `@llvm.guard_on` into explicit control flow, by "inlining" "an
> implementation" of `@llvm.guard_on` (or by some other means).  I don't
> have a good sense on when in the pipeline this should be done -- the
> answer will depend on what we find as we make LLVM more aggressive
> around optimizing guards.
>
> ## environments without support for bailing to the interpreter
>
> Our VM has deeply integrated support for deoptimizations, but not all
> language runtimes do.  If there is a need for non deoptimizing guards, we
> can consider introducing a variant of `@llvm.guard_on`:
>
> ```
> declare void @llvm.exception_on(i1 %predicate, i32 %exceptionKind)
> ```
>
> with this one having the semantics that it always throws an exception
> if `%predicate` fails.  Only the non-widening optimizations for
> `@llvm.guard_on` will apply to `@llvm.exception_on`.
>
> ## memory effects (unresolved)
>
> [I haven't come up with a good model for the memory effects of
>  `@llvm.guard_on`, suggestions are very welcome.]
>
> I'd really like to model `@llvm.guard_on` as a readonly function,
> since it does not write to memory if it returns; and e.g. forwarding
> loads across a call to `@llvm.guard_on` should be legal.
>
> However, I'm in a quandary around representing the "may never return"
> aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a
> load form `%ptr` across a guard on `%ptr != null`.  There are couple
> of ways I can think of dealing with this, none of them are both easy
> and neat:
>
>  - State that since `@llvm.guard_on` could have had an infinite loop
>    in it, it may never return. Unfortunately, the LLVM IR has some
>    rough edges on readonly infinite loops (since C++ disallows that),
>    so LLVM will require some preparatory work before we can do this
>    soundly.
>
>  - State that `@llvm.guard_on` can unwind, and thus has non-local
>    control flow.  This can actually work (and is pretty close to
>    the actual semantics), but is somewhat of hack since
>    `@llvm.guard_on` doesn't _really_ throw an exception.
>
>   - Special case `@llvm.guard_on` (yuck!).
>
> What do you think?
>
> -- Sanjoy
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160218/847fcc0f/attachment.html>