[llvm-dev] RFC: Add guard intrinsics to LLVM

Wed Feb 17 15:40:34 PST 2016

On 02/16/2016 06:06 PM, Sanjoy Das wrote:
> This is a proposal to add guard intrinsics to LLVM.
A couple of inline comments building on Sanjoy's points follow.
> Couple of glossary items: when I say "interpreter" I mean "the most
> conservative tier in the compilation stack" which can be an actual
> interpreter, a "splat compiler" or even a regular JIT that doesn't
> make optimistic assumptions.  By "bailing out to the interpreter" I
> mean "side exit" as defined by Philip in
> http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/
>
>
> # high level summary
>
> Guard intrinsics are intended to represent "if (!cond) leave();" like
> control flow in a structured manner.  This kind of control flow is
> abundant in IR coming from safe languages due to things like range
> checks and null checks (and overflow checks, in some cases).  From a
> high level, there are two distinct motivations for introducing guard
> intrinsics:
>
>   - To keep control flow as simple and "straight line"-like as is
>     reasonable
>
>   - To allow certain kinds of "widening" transforms that cannot be
>     soundly done in an explicit "check-and-branch" control flow
>     representation
>
> ## straw man specification
>
> Signature:
>
> ```
> declare void @llvm.guard_on(i1 %predicate) ;; requires [ "deopt"(...) ]
> ```
>
> Semantics:
>
> `@llvm.guard_on` bails out to the interpreter (i.e. "deoptimizes" the
> currently execution function) if `%predicate` is `false`, meaning that
> after `@llvm.guard_on(i1 %t)` has executed `%t` can be assumed to be
> true.  In this way, it is close to `@llvm.assume` or an `assert`, with
> one very important difference -- `@llvm.guard_on(i1 <false>)` is well
> defined (and not UB).  `@llvm.guard_on` on a false predicate bails to
> the interpreter and that is always safe (but slow), and so
> `@llvm.guard_on(i1 false)` is basically a `noreturn` call that
> unconditionally transitions the current compilation to the
> interpreter.
It's also worth noting that @llvm.guard_on(i1 true) is useful and well 
defined as well.  When lowering, such a guard simply disappears, but it 
can be useful to keep around in the IR during optimization.  It gives a 
well defined point for widening transforms to apply with a well known 
deoptimization state already available.

I'd suggest a small change to Sanjoy's declaration.  I think we should 
allow additional arguments to the guard, not just the condition.  What 
exactly those arguments mean would be up to the runtime, but various 
runtimes might want to provide additional arguments to the OSR mechanism.

>
> Bailing out to the interpreter involves re-creating the state of the
> interpreter frames as-if the compilee had been executing in the
> interpreter all along.  This state is represented and maintained using
> a `"deopt"` operand bundle attached to the call to `@llvm.guard_on`.
> The verifier will reject calls to `@llvm.guard_on` without a `"deopt"`
> operand bundle.
This introduces a very subtle point.  The side exit only effects the 
*function* which contains the guard.  A caller of that function in the 
same module may be returned to by either the function itself, or the 
interpreter after running the continuation implied by the guard.  This 
introduces a complication for IPA/IPO; any guard (really, any side exit, 
of which guards are one form) has to be treated as a possible return 
point from the callee with an unknowable return value and memory state.

In our prototype implementation, we've implemented this by doing a 
linear scan of the function in each IPO pass and bailing if we see a 
side exit.  One approach we could consider upstream is to introduce a 
function attribute which indicates "can exit via side exit or guard".  
Any function containing a guard would be required to have the 
attribute.  InlineFunction could propagate it to the caller when 
inlining a callee with it.  FunctionAttrs could remove it if no guards 
(or callees with guards) were found.  This would make the implementation 
of the bail in our IPO passes much cheaper, but it wouldn't remove the 
need for it.

> `@llvm.guard_on` cannot be `invoke`ed (that would be
> meaningless anyway, since the method it would've "thrown" into is
> about to go away).
I disagree with this bit.  It needlessly complicates the inliner. 
Allowing an invoke of a guard which SimplifyCFG converts to calls just 
like it would a nothrow function seems much cleaner.
>
>
> Example:
>
> ```
>    ...
>    %rangeCheck0 = icmp ult i32 %arg0, 10
>    call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt state 0 */) ]
>    %rangeCheck1 = icmp ult i32 %arg0, 12
>    call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */) ]
>    ...
> ```
>
>
> # details: motivation & alternatives
>
> As specified in the high level summary, there are two key motivations.
>
> The first, more obvious one is that we want the CFG to be less
> complex, even if that involves "hiding" control flow in guard
> intrinsics.  I expect this to benefit both compile time and
> within-a-block optimization.
I want to really emphasize this point.  I expect that using guard 
intrinsics would be a major compile time win for higher level 
languages.  (I can't say for sure, we currently use the explicit control 
model.)

Given many of our transforms are block-at-a-time, using guards might 
also lead to better optimization.  I'm particularly interested in seeing 
how this influences instruction selection.
>
> The second, somewhat less obvious motivation to use guard intrinsics
> instead of explicit control flow is to allow check widening.
>
> ## check widening
>
> Consider:
>
> ```
>    ...
>    %rangeCheck0 = icmp ult i32 6, %len   ;; for a[6]
>    call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt state 0 */) ]
>    call void @printf("hello world")
>    %rangeCheck1 = icmp ult i32 7, %len   ;; for a[7]
>    call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */) ]
>    access a[6] and a[7]
>    ...
> ```
>
> we'd like to optimize it to
>
> ```
>    ...
>    %rangeCheckWide = icmp ult i32 7, %len
>    call void @llvm.guard_on(i1 %rangeCheckWide) [ "deopt"(/* deopt state 0 */) ]
>    call void @printf("hello world")
>    ;; %rangeCheck1 = icmp ult i32 7, %len  ;; not needed anymore
>    ;; call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */) ]
>    access a[6] and a[7]
>    ...
> ```
>
> This way we do a range check only on `7` -- if `7` is within bounds,
> then we know `6` is too.  This transform is sound only because we know
> that the guard on `7 ult %len` will not simply throw an exception if
> the said predicate fails, but will bail out to the interpreter with
> the abstract state `/* deopt state 0 */`.  In fact, if `%len` is `7`,
> the pre-transform program is supposed to print `"hello world"` and
> *then* throw an exception, and bailing out to the interpreter with `/*
> deopt state 0 */` will do exactly that.
>
> In other words, we're allowed to do speculative and aggressive
> transforms that make a guard fail that wouldn't have in the initial
> program.  This is okay because a failed guard only bails to the
> interpreter, and the interpreter always Does The Right Thing(TM).  In
> fact, it is correct (though unwise) to replace every guard's predicate
> with `false`.
Another optimization this allows is a more code size friendly version of 
IRCE.  Today, we have to generate pre and post loops when removing 
bounds checks from an inner loop unless we can *prove* that no 
iterations would run in those loops.  With a guard construct, we can 
find a guard before the loop and "widen" it to account for the pre and 
post iteration spaces.  This transformation is sometimes known as "loop 
predication".
>
> ## the problem with check widening and explicit control flow
>
> Widening is difficult to get right in an explicit "check-and-branch"
> representation.  For instance, the above example in a
> "check-and-branch" representation would be (in pseudo C, and excluding
> the printf):
>
> ```
>    ...
>    if (!(6 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable; }
>    if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; }
>    ...
> ```
>
> The following transform is invalid:
>
> ```
>    ...
>    if (!(7 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable; }
>    if (!(true)) { call @side_exit() [ "deopt"(Q) ]; unreachable; }
>    ...
> ```
>
> since we do not know if the first side exit had been optimized under
> the assumption `!(6 < %len)` (via JumpThreading etc.).  E.g. the
> "original" IR could have been
>
> ```
>    ...
>    if (!(6 < %len)) { call @side_exit() [ "deopt"(!(6 < %len)) ]; unreachable; }
>    if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; }
>    ...
> ```
>
> which got optimized to
>
> ```
>    ...
>    if (!(6 < %len)) { call @side_exit() [ "deopt"(true) ]; unreachable; }
>    if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; }
>    ...
> ```
>
> before the widening transform. The widening transform will now
> effectively pass in an incorrect value for `!(6 < %len)`.
>
> This isn't to say it is impossible to do check widening in a explicit
> control flow representation, just that is more natural to do it with
> guards.
>
>
> # details: semantics
>
> ## as-if control flow
>
> The observable behavior of `@llvm.guard_on` is specified as:
>
> ```
>    void @llvm.guard_on(i1 %pred) {
>    entry:
>      %unknown_cond = < unknown source >
>      %cond = and i1 %unknown_cond, %pred
>      br i1 %cond, label %left, label %right
>
>    left:
>      call void @bail_to_interpreter() [ "deopt"() ] noreturn
>      unreachable
>
>    right:
>      ret void
>    }
> ```
>
> So, precisely speaking, `@llvm.guard_on` is guaranteed to bail to the
> interpreter if `%pred` is false, but it **may** bail to the
> interpreter if `%pred` is true.  It is this bit that lets us soundly
> widen `%pred`, since all we're doing is "refining" `< unknown source >`.
Unless I'm misreading this, it looks like Sanjoy got the IR in the 
specification wrong.  The intrinsic is specified to side exit if the 
condition is false (so that it's true in the caller after the guard), 
not the other way around.  The text description appears correct.
>
> `@bail_to_interpreter` does not return to the current compilation, but
> it returns to the `"deopt"` continuation that is has been given (once
> inlined, the empty "deopt"() continuation will be fixed up to have the right
> continuation).
This "bail_to_interpreter" is a the more general form of side exit I 
mentioned above.
>
>
> ## applicable optimizations
>
> Apart from widening, most of the optimizations we're interested in are
> what's allowed by an equivalent `@llvm.assume`.  Any conditional
> branches dominated by a guard on the same condition can be folded,
> multiple guards on the same condition can be CSE'ed, loop invariant
> guards can be hoisted out of loops etc.
>
> Ultimately, we'd like to recover the same quality of optimization as
> we currently get from the "check-and-branch" representation.  With the
> "check-and-branch" representation, the optimizer is able to sink
> stores and computation into the slow path. This is something it cannot
> do in the guard_on representation, and we'll have to lower the
> guard_on representation to the "check-and-branch" representation at a
> suitable point in the pipeline to get this kind of optimization.
>
> ## lowering
>
> At some point in the compilation pipeline, we will have to lower
> `@llvm.guard_on` into explicit control flow, by "inlining" "an
> implementation" of `@llvm.guard_on` (or by some other means).  I don't
> have a good sense on when in the pipeline this should be done -- the
> answer will depend on what we find as we make LLVM more aggressive
> around optimizing guards.
As a starting point, I'd likely do this just before code gen prep with 
some custom sinking logic to pull instructions only used on the failing 
path into the newly introduced block.  Long term, we'd probably want to 
do the same thing over MI, but mucking with control flow at that layer 
is a bit more complicated.

Rather than framing this as inlining, I'd frame it as expansion to a 
well known body (pretty much the one Sanjoy gives above).  The 
@bail_to_interpreter construct could be lowered directly to a function 
call to some well known symbol name (which JIT users can intercept and 
bind to whatever they want.)  Something like __llvm_side_exit seems like 
a reasonable choice.
>
> ## environments without support for bailing to the interpreter
>
> Our VM has deeply integrated support for deoptimizations, but not all
> language runtimes do.  If there is a need for non deoptimizing guards, we
> can consider introducing a variant of `@llvm.guard_on`:
>
> ```
> declare void @llvm.exception_on(i1 %predicate, i32 %exceptionKind)
> ```
>
> with this one having the semantics that it always throws an exception
> if `%predicate` fails.  Only the non-widening optimizations for
> `@llvm.guard_on` will apply to `@llvm.exception_on`.
Not sure we actually need this.  A valid implementation of the side exit 
handler (which would do the OSR for us) is to call a runtime routine 
which generates and throws the exception.  The only bit we might need is 
a distinction between widdenable and non-widenable guards.

>
> ## memory effects (unresolved)
>
> [I haven't come up with a good model for the memory effects of
>   `@llvm.guard_on`, suggestions are very welcome.]
>
> I'd really like to model `@llvm.guard_on` as a readonly function,
> since it does not write to memory if it returns; and e.g. forwarding
> loads across a call to `@llvm.guard_on` should be legal.
>
> However, I'm in a quandary around representing the "may never return"
> aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a
> load form `%ptr` across a guard on `%ptr != null`.
Modeling this as memory dependence just seems wrong.  We already have to 
model control dependence on functions which may throw.  I don't think 
there's anything new here.

The only unusual bit is that we're going to want to teach AliasAnalysis 
that the guard does write to any memory location (to allow forwarding) 
while still preserving the control dependence.
> There are couple
> of ways I can think of dealing with this, none of them are both easy
> and neat:
>
>   - State that since `@llvm.guard_on` could have had an infinite loop
>     in it, it may never return. Unfortunately, the LLVM IR has some
>     rough edges on readonly infinite loops (since C++ disallows that),
>     so LLVM will require some preparatory work before we can do this
>     soundly.
>
>   - State that `@llvm.guard_on` can unwind, and thus has non-local
>     control flow.  This can actually work (and is pretty close to
>     the actual semantics), but is somewhat of hack since
>     `@llvm.guard_on` doesn't _really_ throw an exception.
Er, careful.  Semantically, the guard *might* throw an exception. It 
could be that's what the interpreter does when evaluating the 
continuation implied by the guard and any of our callers have to account 
for the fact the function which contains the guard might throw.  The 
easiest way to ensure that is to model the guard call as possibly throwing.
>
>    - Special case `@llvm.guard_on` (yuck!).
>
> What do you think?
>
> -- Sanjoy