[LLVMdev] RFC: implicit null checks in llvm

Joseph Tremoulet jotrem at microsoft.com
Tue Apr 28 17:19:18 PDT 2015


Hi,

I'd like to make sure this is headed in a direction that we'll be able to use/extend it for LLILC/CoreCLR (C#).  A few questions:

1) We've got a runtime that will generate an ABI-level exception object in response to the machine trap.  I'd like the compiler to be able to support targeting a runtime with that behavior (if nothing else, it saves us the code size of a call instruction per [group of] never-dynamically-failing null check[s]).  So, given an explicit check that looks like this:

        %NullCheck = icmp eq %Ty* %2, null
        br i1 %NullCheck, label %ThrowNullRef, label %3

      ; <label>:3                                       ; preds = %1
        %4 = <index into %2 by constant offset smaller than guard page>
        %5 = load i32, i32* %4, align 8
        ... (code that uses %5)

      ThrowNullRef:                                     ; preds = %1
        invoke void ThrowNullReferenceException() #2
                to label %6 unwind label %ExceptionDispatch

      ; <label>:6                                       ; preds = %ThrowNullRef
        unreachable

      ExceptionDispatch:                                ; preds = %ThrowNullRef
        %ExnData = landingpad { i8*, i32 } personality void ()* @ProcessCLRException
      ...


I understand the proposal here is to add a pass that changes this to:

        %5 = invoke  @llvm.load_with_trap(...) %2, <index>
              to label %ok, unwind label %not_ok
      ok:
        ... (code that uses %5)

      not_ok:
        %unused = landingpad { i8*, i32 } personality void ()* @__llvm_implicit_null_check
        br %ThrowNullRef

      ThrowNullRef:                                     ; preds = %1
        invoke void ThrowNullReferenceException() #2
                to label %6 unwind label %ExceptionDispatch

      ; <label>:6                                       ; preds = %ThrowNullRef
        unreachable

      ExceptionDispatch:                                ; preds = %ThrowNullRef
        %ExnData = landingpad { i8*, i32 } personality void ()* @ProcessCLRException
      ...

But what I eventually need is to also eliminate the call:

        %5 = invoke  @llvm.load_with_trap(...) %2, <index>
              to label %ok, unwind label %ExceptionDispatch
      ok:
        ... (code that uses %5)

      ExceptionDispatch:                                ; preds = %ThrowNullRef
        %ExnData = landingpad { i8*, i32 } personality void ()* @ProcessCLRException
      ...

And I'm trying to understand how I might achieve that.  Allowing a target configuration to direct the null check folding to also fold away the call seems most straightforward; is that the right idea?  Or would it need to be something more like a separate pass that can be run for such targets immediately after this pass, which does the extra folding?


Related, regarding these restrictions:

      > The landingpad for the unwind destination can only be a cleanup landingpad, and the result of the landingpad instruction itself is always undef.

will they be enforced by the verifier?  Could we perhaps make that conditional on the personality routine?

 

2) Is the signature

        T  <at> llvm.load_with_trap(T*)        [modulo name mangling]
        void  <at> llvm.store_with_trap(T, T*) [modulo name mangling]

meant to imply that the pointer must be in address space zero, or that any address space is acceptable?  In our case, since we're planning to follow the statepoint design and use address spaces to distinguish GC pointers, the pointers whose null checks we'll be wanting to fold will most often not be in address space zero.


3) I didn't see a follow-up to this point:

      > If you really cared about the best codegen this would be done in machine IR after scheduling and target LoadStore opts.

Can you elaborate on whether/why this is/isn't the plan?  I'd hate for the use of implicit checks to come at the cost of worse load/store codegen, especially since we'll have null checks on most heap loads/stores.


Thanks
-Joseph



-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Sanjoy Das
Sent: Wednesday, April 22, 2015 10:05 PM
To: LLVM Developers Mailing List; Andrew Trick; Reid Kleckner; David Majnemer; Peter Collingbourne; Hal Finkel; Philip Reames; Russell Hadley
Subject: [LLVMdev] RFC: implicit null checks in llvm

Hi all,

I would like to propose a mechanism that would allow LLVM to fold null pointer checks into "nearby" memory operations, subject to runtime support.  This is related to but not exactly the same as a proposal floated by Peter Collingbourne earlier [1].  The obvious use cases are managed languages like Java, C# and Go that require a null check on pointers before they're used in certain ways (loads, stores, virtual dispatches etc.).  I'm sure there are other less obvious and more interesting use cases.

I plan to break the design into two parts, roughly following the statepoint philosophy:

# invokable @llvm.(load|store)_with_trap intrinsics

We introduce two new intrinsic families

  T @llvm.load_with_trap(T*)        [modulo name mangling]
  void @llvm.store_with_trap(T, T*) [modulo name mangling]

They cannot be `call`ed, they can only be `invoke`d.

Semantically, they try to load from or store to the pointer passed to them as normal load / store instructions do.  @llvm.load_with_trap returns the loaded value on the normal return path.  If the load or store traps then they dispatch to their unwind destination.  The landingpad for the unwind destination can only be a cleanup landingpad, and the result of the landingpad instruction itself is always undef.  The personality function in the landingpad instruction is ignored.

These intrinsics require support from the language runtime to work.
During code generation, the invokes are lowered into normal load or store instructions, followed by a branch (explicit `jmp` or just
fall-through) to the normal destination.  The PC for the unwind destination is recorded in a side-table along with the PC for the load or store.  When a load or store traps or segfaults at runtime, the runtime searches this table to see if the trap is from a PC recorded in the side-table.  If so, it the runtime jumps to the unwind destination, otherwise it aborts.

Note that the signal handler / runtime do not themselves raise exceptions at the ABI level (though they do so conceptually), but the landing pad block can raise one if it wants to.

The table mapping load/store PCs to unwind PCs can be reported to the language runtime via an __llvm_stackmaps like section.  I am strongly in favor of making this section as easy to parse as possible.



# optimization pass to create invokes to @llvm.(load|store)_with_trap

With the @llvm.(load|store)_with_trap intrinsics in place, we can write an LLVM pass that folds null checks into nearby memory operations on that same pointer.  As an example, we can turn

  // r0 is a local register
  if (p != null) {
    r0 += 5;
    *(p + 16) = 42;
    ...
  } else {
    throw_NullPointerException();
    unreachable;
  }

into

  // r0 is a local register
  r0 += 5;
  invoke @llvm_store_with_trap(p + 16, 42) to label %ok, unwind label %not_ok

 not_ok:
  %unused = landingpad .....
  throw_NullPointerException();
  unreachable;

 ok:
  ...

A slight subtlety here is that the store to (p + 16) can trap (and we can branch to not_ok) even if p is not null.  However, in that case the store would have happened in the original program anyway, and the behavior of the original program is undefined.

A prerequisite for this optimization is that in the address space we operate on, loads from and stores to pointers in some small neighborhood starting from address `null` trap deterministically.  The above transform is correct only if we know that *(null + 16) will trap synchronously and deterministically.  Even on platforms where the 0th page is always unmapped, we cannot fold null checks into memory operations on an arbitrary offset away from the pointer to be null checked (e.g. array accesses are not suitable for implicit null checks).

Running this pass sufficiently late in the optimization pipeline will allow for all the usual memory related optimization passes to work as is -- they won't have to learn about the special semantics for the new (load|store)_with_trap intrinsics to be effective.

This pass will have to be a profile-guided optimization pass for fundamental reasons: implicit null checks are a pessimization even if a small fraction of the implicit null checks fail.  Typically language runtimes that use page-fault based null checks recompile methods with failed implicit null checks to use an explicit null check instead (e.g. [2]).


What do you think?  Does this make sense?

[1]: https://groups.google.com/d/msg/llvm-dev/mMQzIt_8z1Y/cnE7WH1HNaoJ
[2]: https://github.com/openjdk-mirror/jdk7u-hotspot/blob/master/src/share/vm/opto/lcm.cpp#L90

-- Sanjoy
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




More information about the llvm-dev mailing list