[llvm-dev] RFC: Strong GC References in LLVM

Fri Jun 24 00:22:39 PDT 2016

This is a proposal to add strong GC reference types to LLVM.

We have some local (downstream) patches that are needed to prevent
LLVM's optimizer from making transforms that are problematic in the
presence of a precise relocating GC.  Adding a notion of a strong GC
reference to LLVM will let us upstream these patches in a principled
manner, and will act as a measure to avoid new problematic transforms.

The word "strong" here is used as opposed to the weaker pointers that
LLVM currently has which can be converted to and from integers without
repercussions (i.e. inttoptr and ptrtoint are nop-like non-sideeffecting
instructions).

Let me emphasize first that this email is intended to *start* a
conversation; and all of the points made here are open for discussion
and debate.  This proposal is informed by only _our_ GC and runtime
and other runtimes may have different requirements.

# Preamble & "Problem Statement"

A quick introduction to some GC basics: a precise relocating garbage
collector needs co-operation from the code generator / compiler to be
able to scan and relocate objects pointed to from frame local state
like CPU registers and stack slots.  Specifically, it needs a list of
the locations of all the GC references present in a stack frame; and
at "safepoints" it stops "mutator" threads (via some mechanism not
pertinent here), traverses all of the stack frames and uses the
aforementioned list to scan and potentially rewrite those references
to point to the new locations for the same objects.  In addition, some
garbage collectors need read and write barriers -- these are small
bits of code that run when reading and writing GC references (distinct
from memory reordering barriers).  For more details on GC specific
stuff, I suggest referring to http://www.memorymanagement.org/ and our
2014 LLVM-Dev talk http://llvm.org/devmtg/2014-10/#talk4.

Getting LLVM ToT working with a precise relocating GC involves taking
the following steps:

    Note: GCREF is the type representing a pointer that the GC needs
      to know about and potentially relocate.  Currently it is `<Ty>
      addrspace(1)*`.

 1. Generate IR, and represent GC pointer SSA values as values of type
    GCREF.
 2. Run opt -O3.
 3. Run RewriteStatepointForGC.  It rewrites IR values of type GCREF
    to "normal" pointers that the backend can lower (note -- we don't
    physically rewrite the llvm::Types to keep things simple and fast).  It
    does this by making all of the the data flow required to express
    relocation semantics explicit in the IR.
 4. Rewrite loads and stores of GCREF type to execute load and store
    barriers (logically can be part of (3)).
 5. Lower into machine code.

For this discussion let's treat the internals of (3), (4) and (5) as
black boxes, and center our discussion on what properties we require
of the IR after step (2) so that step (3), (4) and (5) can do their
job.  However, if you think the overall design can be simplified by
changing (3), (4) or (5), please feel free to point that out. :)

After (2) and before (3), we want to have the following invariant:

'''
Given any location "L" in the CFG of a function, let the set of all
values of type GCREF whose defs dominate "L" be "G":

  A. "G" is a superset of the references that need to be reported to
     the GC if we put a safepoint at "L".  References need to be
     reported to the GC if the GC may want to scan through them or
     rewrite them to point to the new locations of the objects they
     used to point to.

  B. Everything in "G" is a valid GC reference.  This directly implies
     that all values of type GCREF hold a valid GC reference (since
     "L" could be the location immediately following the def).
'''

Few important things to note about "valid GC reference":

 1. There is no fixed notion of a valid GC reference.  What
    constitutes a valid GC reference can change at arbitrary points in
    time.

 2. There may not be a runtime bitwise predicate that can reliably
    distinguish a "valid" GC reference from an invalid one.  The
    validity of a GC reference is a function of its provenance as well
    as its bitwise content.

 3. (Related to the previous point) Arbitrary GEPs off of "valid GC
    references" are also "valid GC references".  This includes out of
    bounds GEPs (whose bitwise representation can have any value, for
    all practical intents and purposes).

[Aside: We could phrase the invariant in terms of fixed safepoint
 locations, but I think focusing on this (stronger) invariant actually
 makes the problem simpler (and no harder).]

The invariant is somewhat vague and relies on white-box knowledge of
the GC to define what values "need to be reported to the GC" (this
will become clearer as we see some examples); and the "problem
statement" is to come up with a precise set of rules governing the
GCREF type so that the invariants we need fall out.  Today we enforce
the invariants downstream by `#if 0` ing out code that breaks it.
However that's fragile; and more importantly, it does not help other
people interested in precise GC and LLVM.

# Examples of Bad Transforms

Let's look at some some examples of bad transforms that are legal
today (in upstream LLVM) with GCREF == "addrspace(1)*".  Right now
we'll justify their "badness" by invoking ad-hoc GC specific rules on
a case by case basis:

## Integer -> GCREF Conversion

(%value is unused)

%value = load i64, i64* %ptr
== Transformed to ==>
%value = load GCREF, GCREF* (bitcast %ptr)

This is a problem since we've introduced a value of type GCREF that
may not be a valid GC reference at runtime.  If the compiler can prove
that %value is actually a valid GC reference then this transform is
valid, but typically it won't be able to except in cases like "%value
is provably 0" in which case it is better to just fold the load away.

%value = load i64, i64* %ptr
== Transformed to ==>
%value = load i64, i64* %ptr
%value_gc = inttoptr %value to GCREF  (%value_gc unused)

is invalid for the same reason.

## GCREF -> Integer conversion

Converting a GCREF to an integer is fine if the integer is unused.
However, cases like these are problematic:

%v0 = load GCREF, GCREF* %ptr_0
%v1 = load GCREF, GCREF* %ptr_1
%cmp = icmp eq %v0, %v1
== Transformed to ==>
%v0 = load i64, i64* %ptr_0
%v1 = load i64, i64* %ptr_1
%cmp = icmp eq %v0, %v1

Since in the second version if "L" is, say, between %v0 and %v1, "G"
would not include "v0" which contains a GC reference that the GC may
want to scan or relocate[0].

For similar reasons

%v0 = load GCREF, GCREF* %ptr_0
%v1 = load GCREF, GCREF* %ptr_0
%cmp = icmp eq %v0, %v1
== Transformed to ==>
%v0 = load GCREF, GCREF* %ptr_0
%v0.i = ptrtoint %v0
%v1 = load GCREF, GCREF* %ptr_0
%v1.i = ptrtoint %v1
%cmp = icmp eq %v0.i, %v1.i

is also invalid.

## Round tripping GCREF's through Integers

The following transform is invalid:

%v0 = load GCREF, GCREF* %loc0
store GCREF %v0, GCREF* %loc1
== Transformed to ==>
%v0 = load i64, i64* (bitcast %loc0)
store i64 %v0, i64* (bitcast %loc1)

because if "L" in the second version is between the load and the
store, "G" will not include `%v0` even though we need to report it to
the GC (otherwise we could propagate a stale bitwise representation of
the GC reference we loaded in `%v0`).

## Bad types due to hoisting loads out of control flow

This is bad

if (is_reference) {
  %val = load GCREF, GCREF* %ptr
}
== Transformed to ==>
%val = load GCREF, GCREF* %ptr
if (is_reference) {
}

unless the compiler can prove that %val is a GC reference at its new
location.  Downstream we model the Java type system in LLVM IR to try
to prove that %val is a GC reference in its new location, but for
starters we will have to be conservative upstream.

# Proposed Solution:

We introduce a "new" LLVM type.  I will still refer to it as GCREF
here, but it may actually still be "<ty> addrspace(k)*" where k is
specially noted in the datalayout.

Semantics:

 1. GCREF represents an equivalence class of values (equivalence
    relation being "points to a fixed semantic object").  The bitwise
    representation fluctuates constantly outside the compiler's
    control (the dual of `undef`), but may have invariants (in
    particular, we'd like to be able to specify alignment, nonnull
    etc.).  At any given point in time all GCREF instances pointing to
    the same object have the same bitwise representation (we need this
    to make `icmp eq` is well-defined).

 2. GCREF instances can only contain a valid gc reference (otherwise
    they can't meaningfully "fluctuate" among the various possible
    bitwise representations of a reference).

 3. Converting GCREF to integers is fine in general, but you'll get an
    arbitrary "snapshot" of the bitwise value that will generally not
    be meaningful (unless you are colluding with the GC in
    implementation defined ways).

 4. Converting integers to GCREF is allowed only if source integer is
    a bitwise representation of a valid GC reference that is not an
    out of bounds derived reference.  However, this is difficult for
    the compiler to infer since it typically will have no fundamental
    knowledge of what bitwise representation can be a valid GC
    reference.

 5. Operations that use a GCREF-typed value are "atomic" in using the
    bitwise representation, i.e., loading from a GCREF typed value
    does not "internally" convert the GCREF to a normal
    integer-pointer and then use the integer-pointer, since that would
    mean there is a window in which the integer-pointer can become
    stale[1].

 6. A GCREF stored to a location in the heap continues to fluctuate,
    and keeps itself in sync with the right bitwise representation.
    In a way, there isn't a large distinction between the GC and the
    heap -- the heap is part of (or managed by) the GC.

I think (6) is the most controversial of the semantics above, but it
isn't very different from how `undef` stored to the heap remains
`undef` (i.e. a non-deterministic N-bit value) and a later load can
recover `undef` instead of getting a normal N-bit value.

## How do we fare with the examples?

Let's take a look some of the examples, and see how the semantics
specified above help us:

### Demoting loads / stores from GCREF to integers is invalid (case 1):

%v0 = load GCREF, GCREF* %ptr_0
%v1 = load GCREF, GCREF* %ptr_1
%cmp = icmp eq %v0, %v1
==>
%v0 = load i64, i64* %ptr_0
%v1 = load i64, i64* %ptr_1
%cmp = icmp eq %v0, %v1

Invalid because the initial IR is comparing two GC fluctuating
references, while the the transformed IR will compare their bitwise
snapshot taken at an arbitrary point in time.

### Demoting loads / stores from GCREF to integers is invalid (case 2):

%v0 = load GCREF, GCREF* %loc0
store GCREF %v0, GCREF* %loc1
==>
%v0 = load i64, i64* (bitcast %loc0)
store i64 %v0, i64* (bitcast %loc1)

Invalid because there are two semantic differences between the initial
and final IR:

 1. In the initial IR we store a fluctuating and up-to-date GCREF,
    whereas in the final IR we store a potentially stale bitwise
    snapshot of the value.
 2. In the initial IR, after the store *%loc1 has a fluctuating GC
    reference, while in the final IR *%loc1 only has a once-valid
    snapshot of a GC reference (that will no longer fluctuate).

### Basic store forwarding is valid:

store  GCREF %v0, GCREF* %loc0
%v1 = load GCREF, GCREF* %loc0
==>
store GCREF %v0, GCREF* %loc0
%v1 = %v0

The store forwarding is valid since we stored a GCREF that
semantically points to object O (say), and loaded back a GCREF
pointing to the same object O.

### Hoisting inttoptr is (generally) invalid:

  if (<condition>) {
    %val_gc = ptrtoint %val to GCREF
  }
==>
  %val_gc = ptrtoint %val to GCREF
  if (<condition>) {}

is invalid since we do not know that `%val` is a valid bitwise
representation of a GCREF at the point in time we do the `ptrtoint`
(i.e. <<condition>> could be controlling whether `%val` is a valid
bitwise representation of a GCREF).

In theory this isn't different from loads, divisions or any other
potentially trapping operation, but I think a lot of LLVM directly
encodes the fact that `CastInst` s are obviously speculatable, so
maybe Integer -> GCREF casts should be outlawed at the type system
level, and be done via an intrinsic?

### Store forwarding between integers and GCREFs is valid

store i64 %val, i64* %loc
%val = load GCREF, GCREF* (bitcast %loc)
==>
store i64 %val, i64* %loc
%val = inttoptr %val to GCREF

is valid.

Both the initial and the final program assume that %val is a valid
bitwise representation of a GC reference.

(Note: it may not be valid to speculate the inttoptr instruction above
control flow).

### Load forwarding between integers and GCREFs is invalid

%val_i = load i64, i64* %loc
%val_p = load GCREF, GCREF* (bitcast %loc)
==>
%val_i = load i64, i64* %loc
%val_p = inttoptr %val_i to GCREF

is invalid if *%loc contains a GC reference.  Given the model that we
have, the first program loads some bitwise representation of
fluctuating a GC reference, and then loads the same GC reference as a
GCREF; whereas the second program tries to convert a potentially stale
bitwise representation of a GC reference into a fluctuating GCREF
(which is not allowed).

## Implementation Issues & Impact to LLVM

This is a proposed path forward to implementing GCREF in LLVM.  The
caveat mentioned at the start of the email (everything is open for
discussion) especially applies here.

### Changes to IR to represent GCREF

We change `datalayout` to denote pointers of a specific address space
as GCREFs.  We add verifier rules forbidding `inttoptr` conversion of
integers into GCREFs and add an intrinsic (that can be control
dependent) that converts integers to GCREFs.

### Changes to the optimizer

We change the optimizer to

 - Not speculate loads of GCREF type
 - Not change uses of GCREF types to uses of non-GCREF types

Locally we already have changes fixing most of the places that'd need
to be fixed for the above to hold.

### Lowering

We do not teach LLC about GCREF's at all (in fact, we can change it to
assert that the datalayout does not contain a `gc` directive).  We
make the RewriteStatepointForGC pass rewrite the IR to make
relocations explicit (so that no more GCREF-like semantics are
necessary) and remove the GC directive from the Module's datalayout.

Thoughts?

-- Sanjoy

Footnotes:

[0]:
To see why this is bad assume to begin with %ptr_0 and %ptr_1 both
contain a GC reference pointing to object X, currently at address
0x500.

Now the following happens

%v0 = load i64, i64* %ptr_0
%v0 is 0x500
<< safepoint >> GC moves X to 0x600, and update *%ptr_1
%v1 = load i64, i64* %ptr_1
%v1 = 0x600
%cmp = icmp eq %v0, %v1
%cmp = false, when it should be true

If %v0 was "reported to the GC" in <<safepoint>> then it would have
been updated in place to 0x600, and %cmp would evaluate to true, as
expected.

[1]:
In reality, this is handled by _not_ putting safepoints in problematic
locations.  For instance, if lowering a CompareAndSwap in some weird
architecture would involve doing:

  // CAS(From, To, Location)
  Location ^= << Constant >>

  // The GC cannot parse the state at this location "L", since we've
  // done some arbitrary arithmetic on Location.

  ... Do some magic with Location, To, From

then we _cannot_ put a safepoint at location "L".