[LLVMdev] RFC: implicit null checks in llvm

Wed Apr 22 23:44:30 PDT 2015

> On Apr 22, 2015, at 10:05 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
> 
> Hi all,
> 
> I would like to propose a mechanism that would allow LLVM to fold null
> pointer checks into "nearby" memory operations, subject to runtime
> support.  This is related to but not exactly the same as a proposal
> floated by Peter Collingbourne earlier [1].  The obvious use cases are
> managed languages like Java, C# and Go that require a null check on
> pointers before they're used in certain ways (loads, stores, virtual
> dispatches etc.).  I'm sure there are other less obvious and more
> interesting use cases.

This feature will keep being requested. I agree LLVM should support it, and am happy to see it being done right.

> I plan to break the design into two parts, roughly following the
> statepoint philosophy:
> 
> # invokable @llvm.(load|store)_with_trap intrinsics
> 
> We introduce two new intrinsic families
> 
>  T @llvm.load_with_trap(T*)        [modulo name mangling]
>  void @llvm.store_with_trap(T, T*) [modulo name mangling]
> 
> They cannot be `call`ed, they can only be `invoke`d.
> 
> Semantically, they try to load from or store to the pointer passed to
> them as normal load / store instructions do.  @llvm.load_with_trap
> returns the loaded value on the normal return path.  If the load or
> store traps then they dispatch to their unwind destination.  The
> landingpad for the unwind destination can only be a cleanup
> landingpad, and the result of the landingpad instruction itself is
> always undef.  The personality function in the landingpad instruction
> is ignored.
> 
> These intrinsics require support from the language runtime to work.
> During code generation, the invokes are lowered into normal load or
> store instructions, followed by a branch (explicit `jmp` or just
> fall-through) to the normal destination.  The PC for the unwind
> destination is recorded in a side-table along with the PC for the load
> or store.  When a load or store traps or segfaults at runtime, the
> runtime searches this table to see if the trap is from a PC recorded
> in the side-table.  If so, it the runtime jumps to the unwind
> destination, otherwise it aborts.

The intrinsics need to be lowered to a pseudo instruction just like patchpoint (so that a stackmap can be emitted). In my mind the real issue here is how to teaching this pseudo instruction to emit the proper load/store for the target.

> Note that the signal handler / runtime do not themselves raise
> exceptions at the ABI level (though they do so conceptually), but the
> landing pad block can raise one if it wants to.
> 
> The table mapping load/store PCs to unwind PCs can be reported to the
> language runtime via an __llvm_stackmaps like section.  I am strongly
> in favor of making this section as easy to parse as possible.

Let’s just be clear that it is not recommended for the frontend to produce these intrinsics. They are a compiler backend convenience. (I don’t want InstCombine or any other standard pass to start trafficking in these.)

> # optimization pass to create invokes to @llvm.(load|store)_with_trap
> 
> With the @llvm.(load|store)_with_trap intrinsics in place, we can
> write an LLVM pass that folds null checks into nearby memory
> operations on that same pointer.  As an example, we can turn
> 
>  // r0 is a local register
>  if (p != null) {
>    r0 += 5;
>    *(p + 16) = 42;
>    ...
>  } else {
>    throw_NullPointerException();
>    unreachable;
>  }
> 
> into
> 
>  // r0 is a local register
>  r0 += 5;
>  invoke @llvm_store_with_trap(p + 16, 42) to label %ok, unwind label %not_ok
> 
> not_ok:
>  %unused = landingpad .....
>  throw_NullPointerException();
>  unreachable;
> 
> ok:
>  ...
> 
> A slight subtlety here is that the store to (p + 16) can trap (and we
> can branch to not_ok) even if p is not null.  However, in that case
> the store would have happened in the original program anyway, and the
> behavior of the original program is undefined.
> 
> A prerequisite for this optimization is that in the address space we
> operate on, loads from and stores to pointers in some small
> neighborhood starting from address `null` trap deterministically.  The
> above transform is correct only if we know that *(null + 16) will trap
> synchronously and deterministically.  Even on platforms where the 0th
> page is always unmapped, we cannot fold null checks into memory
> operations on an arbitrary offset away from the pointer to be null
> checked (e.g. array accesses are not suitable for implicit null
> checks).

This is a platform dependent intrinsic. There’s nothing wrong with a platform specific size for the unmapped page zero if we don’t already have one.

> Running this pass sufficiently late in the optimization pipeline will
> allow for all the usual memory related optimization passes to work as
> is -- they won't have to learn about the special semantics for the new
> (load|store)_with_trap intrinsics to be effective.

Good. This is a codegen feature. We can’t say that enough. If you really cared about the best codegen this would be done in machine IR after scheduling and target LoadStore opts.

> This pass will have to be a profile-guided optimization pass for
> fundamental reasons: implicit null checks are a pessimization even if
> a small fraction of the implicit null checks fail.  Typically language
> runtimes that use page-fault based null checks recompile methods with
> failed implicit null checks to use an explicit null check instead (e.g. [2]).

I don’t think making it profile-guided is important. Program behavior can change after compilation and you’re back to the same problem. I think recovering from repeated traps is important. That’s why you need to combine this feature with either code invalidation points or patching implemented via llvm.stackmap, patchpoint, (or statepoint) — they’re all the same thing.

> What do you think?  Does this make sense?

Well, you need the features that patchpoint gives you (stackmaps entries) and you’ll need to use patchpoints or stackmaps anyway for invalidation or patching. So why are you bothering with a totally new, independent intrinsic? Why not just extend the existing intrinsics. We could have a variant that

- emits a load instead of a call

- looks at the landing pad to generate a special stackmap entry in addition to the normal exception table (I don’t even see why you need this, except that the runtime doesn’t know how to parse an exception table.)

Andy

> [1]: https://groups.google.com/d/msg/llvm-dev/mMQzIt_8z1Y/cnE7WH1HNaoJ
> [2]: https://github.com/openjdk-mirror/jdk7u-hotspot/blob/master/src/share/vm/opto/lcm.cpp#L90
> 
> -- Sanjoy