[LLVMdev] RFC: implicit null checks in llvm

Wed Apr 22 22:05:22 PDT 2015

Hi all,

I would like to propose a mechanism that would allow LLVM to fold null
pointer checks into "nearby" memory operations, subject to runtime
support.  This is related to but not exactly the same as a proposal
floated by Peter Collingbourne earlier [1].  The obvious use cases are
managed languages like Java, C# and Go that require a null check on
pointers before they're used in certain ways (loads, stores, virtual
dispatches etc.).  I'm sure there are other less obvious and more
interesting use cases.

I plan to break the design into two parts, roughly following the
statepoint philosophy:

# invokable @llvm.(load|store)_with_trap intrinsics

We introduce two new intrinsic families

  T @llvm.load_with_trap(T*)        [modulo name mangling]
  void @llvm.store_with_trap(T, T*) [modulo name mangling]

They cannot be `call`ed, they can only be `invoke`d.

Semantically, they try to load from or store to the pointer passed to
them as normal load / store instructions do.  @llvm.load_with_trap
returns the loaded value on the normal return path.  If the load or
store traps then they dispatch to their unwind destination.  The
landingpad for the unwind destination can only be a cleanup
landingpad, and the result of the landingpad instruction itself is
always undef.  The personality function in the landingpad instruction
is ignored.

These intrinsics require support from the language runtime to work.
During code generation, the invokes are lowered into normal load or
store instructions, followed by a branch (explicit `jmp` or just
fall-through) to the normal destination.  The PC for the unwind
destination is recorded in a side-table along with the PC for the load
or store.  When a load or store traps or segfaults at runtime, the
runtime searches this table to see if the trap is from a PC recorded
in the side-table.  If so, it the runtime jumps to the unwind
destination, otherwise it aborts.

Note that the signal handler / runtime do not themselves raise
exceptions at the ABI level (though they do so conceptually), but the
landing pad block can raise one if it wants to.

The table mapping load/store PCs to unwind PCs can be reported to the
language runtime via an __llvm_stackmaps like section.  I am strongly
in favor of making this section as easy to parse as possible.

# optimization pass to create invokes to @llvm.(load|store)_with_trap

With the @llvm.(load|store)_with_trap intrinsics in place, we can
write an LLVM pass that folds null checks into nearby memory
operations on that same pointer.  As an example, we can turn

  // r0 is a local register
  if (p != null) {
    r0 += 5;
    *(p + 16) = 42;
    ...
  } else {
    throw_NullPointerException();
    unreachable;
  }

into

  // r0 is a local register
  r0 += 5;
  invoke @llvm_store_with_trap(p + 16, 42) to label %ok, unwind label %not_ok

 not_ok:
  %unused = landingpad .....
  throw_NullPointerException();
  unreachable;

 ok:
  ...

A slight subtlety here is that the store to (p + 16) can trap (and we
can branch to not_ok) even if p is not null.  However, in that case
the store would have happened in the original program anyway, and the
behavior of the original program is undefined.

A prerequisite for this optimization is that in the address space we
operate on, loads from and stores to pointers in some small
neighborhood starting from address `null` trap deterministically.  The
above transform is correct only if we know that *(null + 16) will trap
synchronously and deterministically.  Even on platforms where the 0th
page is always unmapped, we cannot fold null checks into memory
operations on an arbitrary offset away from the pointer to be null
checked (e.g. array accesses are not suitable for implicit null
checks).

Running this pass sufficiently late in the optimization pipeline will
allow for all the usual memory related optimization passes to work as
is -- they won't have to learn about the special semantics for the new
(load|store)_with_trap intrinsics to be effective.

This pass will have to be a profile-guided optimization pass for
fundamental reasons: implicit null checks are a pessimization even if
a small fraction of the implicit null checks fail.  Typically language
runtimes that use page-fault based null checks recompile methods with
failed implicit null checks to use an explicit null check instead (e.g. [2]).

What do you think?  Does this make sense?

[1]: https://groups.google.com/d/msg/llvm-dev/mMQzIt_8z1Y/cnE7WH1HNaoJ
[2]: https://github.com/openjdk-mirror/jdk7u-hotspot/blob/master/src/share/vm/opto/lcm.cpp#L90

-- Sanjoy