[llvm-dev] Managed Languages BOF @ Dev Meeting

Mon Oct 19 02:53:50 PDT 2015

On 19 Oct 2015, at 10:27, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
> 
> 
> David Chisnall wrote:
> >> So we'd
> >> have to do repeat the null checks in the unwind block, like
> >>
> >>   superblock:  # unwinds to unwind_block
> >>     null_check(ptr_a)
> >>     do_something
> >>     null_check(ptr_b)
> >>     do_something_again
> >>
> >>   unwind_block:
> >>     ;; either ptr_a is null or ptr_b is null
> >>     if (ptr_a == null)
> >>       throw_nullptrexception(bci = 42)
> >>     else ;; if (ptr_b == null)
> >>       throw_nullptrexception(bci = 43)
> >>
> >> So the code explosion problem still exists (in unwind_block), and
> >> we've duplicated a bunch of code.
> >
> 
> > Does it?  I guess it depends on how you are implementing the
> > exceptions.  I was expecting that the non-call exceptions would be
> > implemented on top of signals (or something SEH-like on Windows), so
> > the trap handler would have some generic code to identify the faulting
> > instruction, map this back to something useful, and then throw the
> > exception.  You wouldn’t be throwing the null pointer exception from
> > the unwind target, that would be generated in the signal handler and
> > the unwind target would only have to do the normal unwinding.
> 
> I see what you mean.  You'd keep some bookkeeping information on each
> "faulting_load" instruction that specifies how the exception being
> thrown has to be constructed.  So my example would look something like
> this:
> 
>   superblock:  # unwinds to unwind_block
>     faulting_load(ptr_a)  # exception_construction_args = (42)
>     do_something
>     faulting_load(ptr_b)  # exception_construction_args = (43)
>     do_something_again
> 
> Did I understand the scheme correctly?

Yes, in much the same way that we already emit data into the LSDA.  Whatever catches the trap (signal handler, SEH handler) would receive the PC of the faulting load, look up the relevant information, and then prepare the exception object.

> The interesting bit (I'm sure you've thought of this, I'm still trying
> to verify that I've understood the scheme correctly) here is that the
> faulting_load instruction will not have the same reordering semantics
> as a normal load.  You cannot reorder two `faulting_load` instructions
> as that would change which NPE you get. E.g. if you were supposed to
> get an NPE on bci 42, when loading ptr_a, in the above example, the
> optimizer cannot change that to getting an NPE on bci 43, when loading
> ptr_b. It therefore cannot re-order faulting_load(ptr_a) and
> faulting_load(ptr_b) even though they're control equivalent, and even if
> aliasing permits.

Yes, that would probably be the trickiest part of this.  Note that this is also somewhat language dependent.  If the source language does not define ordering of memory operations, then reordering them would be fine.  I’m definitely not sufficiently familiar with the Java spec to say exactly when this would or wouldn’t be permitted - I think Java is a lot stricter than C/C++ with respect to evaluation order.  We’d probably need some metadata on each basic block, along with the unwind target, to specify what the requirements are.  Alternatively, we could use the atomic load and store instructions with a new ordering that would guarantee ordering within a single thread but not actually guarantee atomicity.

> > In a JIT environment, the unwind targets could probably also be
> > lazily emitted, so in your initial IR you’d just have a single
> > patchpoint for the unwind target.  My knowledge of common Java idioms
> > is slightly rusty, but my impression was that, while lots of Java code
> > abuses exceptions for non-exceptional behaviour, this is quite rare
> > for null-pointer exceptions.
> 
> Yes, that is accurate.  In fact, LLVM has an implementation of
> page-fault based null check elimination [1] that we've been using for
> a while.  However, this is not done by a new "faulting_load"
> instruction; but is done by late matching "test Reg, Reg; jz throw"
> and trying to "fold" the null check into a "nearby memory operation".
> 
> There are two reasons why we chose this late matching approach:
> 
> 1. Keeping the control flow explicit lets most of the optimizer elide
>    redundant null checks (without us teaching it about another new
>    thing).
> 
> 2. It helps making the optimization profile guided and optional -- if
>    you don't do anything then you still have an explicit null check
>    (the "conservative" case, in this scenario) to fall back on, and
>    not an implicit null check.
> 
>    Since taking a page fault is so much more expensive than a fully
>    predicted explicit null check branch, getting this wrong even in a
>    handful of cases can cause a net regression (because an occasional
>    "catch NullPointerException" can heavily skew the scales).  So we
>    have to rely on recompilation to avoid having a failing implicit
>    null check in an application's steady state.  As soon as we detect
>    that an implicit null check has failed, we kick off a recompile of
>    the same method with information that tells LLVM that that
>    specific null check must not be made implicit this time.
> 
> The problem with this scheme is that it does not address the basic
> block explosion issue that started this discussion.

I can definitely see the attraction of keeping flow control explicit.  Note that moving the unwind target to the basic block also simplifies flow control for explicit (call) exceptions though - it’s not limited to non-call exceptions.  If you have a language like Java with large try blocks, containing a number of calls, all with the same jump target, then the IR will be a lot simpler for cases where the catch block doesn’t touch any variables modified in the try block (you’d still need to split the try block into basic blocks where it does).

The recompilation case would be similar in both cases, only this time you’d be replacing an implicit null check with an explicit one, rather than removing some metadata.

David