[llvm-dev] RFC: Add guard intrinsics to LLVM

Tue Feb 16 18:06:25 PST 2016

This is a proposal to add guard intrinsics to LLVM.

Couple of glossary items: when I say "interpreter" I mean "the most
conservative tier in the compilation stack" which can be an actual
interpreter, a "splat compiler" or even a regular JIT that doesn't
make optimistic assumptions.  By "bailing out to the interpreter" I
mean "side exit" as defined by Philip in
http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/

# high level summary

Guard intrinsics are intended to represent "if (!cond) leave();" like
control flow in a structured manner.  This kind of control flow is
abundant in IR coming from safe languages due to things like range
checks and null checks (and overflow checks, in some cases).  From a
high level, there are two distinct motivations for introducing guard
intrinsics:

 - To keep control flow as simple and "straight line"-like as is
   reasonable

 - To allow certain kinds of "widening" transforms that cannot be
   soundly done in an explicit "check-and-branch" control flow
   representation

## straw man specification

Signature:

```
declare void @llvm.guard_on(i1 %predicate) ;; requires [ "deopt"(...) ]
```

Semantics:

`@llvm.guard_on` bails out to the interpreter (i.e. "deoptimizes" the
currently execution function) if `%predicate` is `false`, meaning that
after `@llvm.guard_on(i1 %t)` has executed `%t` can be assumed to be
true.  In this way, it is close to `@llvm.assume` or an `assert`, with
one very important difference -- `@llvm.guard_on(i1 <false>)` is well
defined (and not UB).  `@llvm.guard_on` on a false predicate bails to
the interpreter and that is always safe (but slow), and so
`@llvm.guard_on(i1 false)` is basically a `noreturn` call that
unconditionally transitions the current compilation to the
interpreter.

Bailing out to the interpreter involves re-creating the state of the
interpreter frames as-if the compilee had been executing in the
interpreter all along.  This state is represented and maintained using
a `"deopt"` operand bundle attached to the call to `@llvm.guard_on`.
The verifier will reject calls to `@llvm.guard_on` without a `"deopt"`
operand bundle.  `@llvm.guard_on` cannot be `invoke`ed (that would be
meaningless anyway, since the method it would've "thrown" into is
about to go away).

Example:

```
  ...
  %rangeCheck0 = icmp ult i32 %arg0, 10
  call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt state 0 */) ]
  %rangeCheck1 = icmp ult i32 %arg0, 12
  call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */) ]
  ...
```

# details: motivation & alternatives

As specified in the high level summary, there are two key motivations.

The first, more obvious one is that we want the CFG to be less
complex, even if that involves "hiding" control flow in guard
intrinsics.  I expect this to benefit both compile time and
within-a-block optimization.

The second, somewhat less obvious motivation to use guard intrinsics
instead of explicit control flow is to allow check widening.

## check widening

Consider:

```
  ...
  %rangeCheck0 = icmp ult i32 6, %len   ;; for a[6]
  call void @llvm.guard_on(i1 %rangeCheck0) [ "deopt"(/* deopt state 0 */) ]
  call void @printf("hello world")
  %rangeCheck1 = icmp ult i32 7, %len   ;; for a[7]
  call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */) ]
  access a[6] and a[7]
  ...
```

we'd like to optimize it to

```
  ...
  %rangeCheckWide = icmp ult i32 7, %len
  call void @llvm.guard_on(i1 %rangeCheckWide) [ "deopt"(/* deopt state 0 */) ]
  call void @printf("hello world")
  ;; %rangeCheck1 = icmp ult i32 7, %len  ;; not needed anymore
  ;; call void @llvm.guard_on(i1 %rangeCheck1) [ "deopt"(/* deopt state 1 */) ]
  access a[6] and a[7]
  ...
```

This way we do a range check only on `7` -- if `7` is within bounds,
then we know `6` is too.  This transform is sound only because we know
that the guard on `7 ult %len` will not simply throw an exception if
the said predicate fails, but will bail out to the interpreter with
the abstract state `/* deopt state 0 */`.  In fact, if `%len` is `7`,
the pre-transform program is supposed to print `"hello world"` and
*then* throw an exception, and bailing out to the interpreter with `/*
deopt state 0 */` will do exactly that.

In other words, we're allowed to do speculative and aggressive
transforms that make a guard fail that wouldn't have in the initial
program.  This is okay because a failed guard only bails to the
interpreter, and the interpreter always Does The Right Thing(TM).  In
fact, it is correct (though unwise) to replace every guard's predicate
with `false`.

## the problem with check widening and explicit control flow

Widening is difficult to get right in an explicit "check-and-branch"
representation.  For instance, the above example in a
"check-and-branch" representation would be (in pseudo C, and excluding
the printf):

```
  ...
  if (!(6 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable; }
  if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; }
  ...
```

The following transform is invalid:

```
  ...
  if (!(7 < %len)) { call @side_exit() [ "deopt"(P) ]; unreachable; }
  if (!(true)) { call @side_exit() [ "deopt"(Q) ]; unreachable; }
  ...
```

since we do not know if the first side exit had been optimized under
the assumption `!(6 < %len)` (via JumpThreading etc.).  E.g. the
"original" IR could have been

```
  ...
  if (!(6 < %len)) { call @side_exit() [ "deopt"(!(6 < %len)) ]; unreachable; }
  if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; }
  ...
```

which got optimized to

```
  ...
  if (!(6 < %len)) { call @side_exit() [ "deopt"(true) ]; unreachable; }
  if (!(7 < %len)) { call @side_exit() [ "deopt"(Q) ]; unreachable; }
  ...
```

before the widening transform. The widening transform will now
effectively pass in an incorrect value for `!(6 < %len)`.

This isn't to say it is impossible to do check widening in a explicit
control flow representation, just that is more natural to do it with
guards.

# details: semantics

## as-if control flow

The observable behavior of `@llvm.guard_on` is specified as:

```
  void @llvm.guard_on(i1 %pred) {
  entry:
    %unknown_cond = < unknown source >
    %cond = and i1 %unknown_cond, %pred
    br i1 %cond, label %left, label %right

  left:
    call void @bail_to_interpreter() [ "deopt"() ] noreturn
    unreachable

  right:
    ret void
  }
```

So, precisely speaking, `@llvm.guard_on` is guaranteed to bail to the
interpreter if `%pred` is false, but it **may** bail to the
interpreter if `%pred` is true.  It is this bit that lets us soundly
widen `%pred`, since all we're doing is "refining" `< unknown source >`.

`@bail_to_interpreter` does not return to the current compilation, but
it returns to the `"deopt"` continuation that is has been given (once
inlined, the empty "deopt"() continuation will be fixed up to have the right
continuation).

## applicable optimizations

Apart from widening, most of the optimizations we're interested in are
what's allowed by an equivalent `@llvm.assume`.  Any conditional
branches dominated by a guard on the same condition can be folded,
multiple guards on the same condition can be CSE'ed, loop invariant
guards can be hoisted out of loops etc.

Ultimately, we'd like to recover the same quality of optimization as
we currently get from the "check-and-branch" representation.  With the
"check-and-branch" representation, the optimizer is able to sink
stores and computation into the slow path. This is something it cannot
do in the guard_on representation, and we'll have to lower the
guard_on representation to the "check-and-branch" representation at a
suitable point in the pipeline to get this kind of optimization.

## lowering

At some point in the compilation pipeline, we will have to lower
`@llvm.guard_on` into explicit control flow, by "inlining" "an
implementation" of `@llvm.guard_on` (or by some other means).  I don't
have a good sense on when in the pipeline this should be done -- the
answer will depend on what we find as we make LLVM more aggressive
around optimizing guards.

## environments without support for bailing to the interpreter

Our VM has deeply integrated support for deoptimizations, but not all
language runtimes do.  If there is a need for non deoptimizing guards, we
can consider introducing a variant of `@llvm.guard_on`:

```
declare void @llvm.exception_on(i1 %predicate, i32 %exceptionKind)
```

with this one having the semantics that it always throws an exception
if `%predicate` fails.  Only the non-widening optimizations for
`@llvm.guard_on` will apply to `@llvm.exception_on`.

## memory effects (unresolved)

[I haven't come up with a good model for the memory effects of
 `@llvm.guard_on`, suggestions are very welcome.]

I'd really like to model `@llvm.guard_on` as a readonly function,
since it does not write to memory if it returns; and e.g. forwarding
loads across a call to `@llvm.guard_on` should be legal.

However, I'm in a quandary around representing the "may never return"
aspect of `@llvm.guard_on`: I have to make it illegal to, say, hoist a
load form `%ptr` across a guard on `%ptr != null`.  There are couple
of ways I can think of dealing with this, none of them are both easy
and neat:

 - State that since `@llvm.guard_on` could have had an infinite loop
   in it, it may never return. Unfortunately, the LLVM IR has some
   rough edges on readonly infinite loops (since C++ disallows that),
   so LLVM will require some preparatory work before we can do this
   soundly.

 - State that `@llvm.guard_on` can unwind, and thus has non-local
   control flow.  This can actually work (and is pretty close to
   the actual semantics), but is somewhat of hack since
   `@llvm.guard_on` doesn't _really_ throw an exception.

  - Special case `@llvm.guard_on` (yuck!).

What do you think?

-- Sanjoy