[llvm-dev] RFC: Strong GC References in LLVM

Sanjoy Das via llvm-dev llvm-dev at lists.llvm.org
Mon Jul 11 23:21:37 PDT 2016

Hi Andy,

Andrew Trick wrote:
 > Sanjoy,
 > This looks very close to my understanding of the statepoint design 
trajectory when you first introduced it. It’s great that you followed 
through and took the time to formalize the IR semantics. It’s been a 
couple years since I’ve thought about it so I may ask some obtuse questions.
 > I think he subject line is wrong though! Did you actually say what a 
“strong GC reference is”? I think you mean "precise GC reference", since 
weak references have the same requirements for being tracked at statepoints.

I meant "strong" as opposed to the "weaker" pointer type that LLVM
currently has (which can be freely converted to and from integers).
But "precise GC reference" is better, so I'll steal that. :)

 >> On Jun 24, 2016, at 12:22 AM, Sanjoy 
Das<sanjoy at playingwithpointers.com>  wrote:
 >> Getting LLVM ToT working with a precise relocating GC involves taking
 >> the following steps:
 >>     Note: GCREF is the type representing a pointer that the GC needs
 >>       to know about and potentially relocate.  Currently it is `<Ty>
 >>       addrspace(1)*`.
 >> 1. Generate IR, and represent GC pointer SSA values as values of type
 >>     GCREF.
 >> 2. Run opt -O3.
 >> 3. Run RewriteStatepointForGC.  It rewrites IR values of type GCREF
 >>     to "normal" pointers that the backend can lower (note -- we don't
 >>     physically rewrite the llvm::Types to keep things simple and 
fast).  It
 >>     does this by making all of the the data flow required to express
 >>     relocation semantics explicit in the IR.
 >> 4. Rewrite loads and stores of GCREF type to execute load and store
 >>     barriers (logically can be part of (3)).
 >> 5. Lower into machine code.
 >> For this discussion let's treat the internals of (3), (4) and (5) as
 >> black boxes, and center our discussion on what properties we require
 >> of the IR after step (2) so that step (3), (4) and (5) can do their
 >> job.  However, if you think the overall design can be simplified by
 >> changing (3), (4) or (5), please feel free to point that out. :)
 > Great.
 >> After (2) and before (3), we want to have the following invariant:
 >> '''
 >> Given any location "L" in the CFG of a function, let the set of all
 >> values of type GCREF whose defs dominate "L" be "G":
 >>   A. "G" is a superset of the references that need to be reported to
 >>      the GC if we put a safepoint at "L".  References need to be
 >>      reported to the GC if the GC may want to scan through them or
 >>      rewrite them to point to the new locations of the objects they
 >>      used to point to.
 >>   B. Everything in "G" is a valid GC reference.  This directly implies
 >>      that all values of type GCREF hold a valid GC reference (since
 >>      "L" could be the location immediately following the def).
 >> ‘''
 > Makes sense. The frontend may want to control the lifetime of 
references, but that can be done with some CFG constructs.
 >> Few important things to note about "valid GC reference":
 >> 1. There is no fixed notion of a valid GC reference.  What
 >>     constitutes a valid GC reference can change at arbitrary points in
 >>     time.
 >> 2. There may not be a runtime bitwise predicate that can reliably
 >>     distinguish a "valid" GC reference from an invalid one.  The
 >>     validity of a GC reference is a function of its provenance as well
 >>     as its bitwise content.
 > I’m not sure what’s being said here, but a GC reference is valid when 
you load it and remains valid as long as the compiler feels like keeping 
it valid. That decision will be made in step #3, statepoint rewriting.

I meant there is no way to look at solely the bitwise representation
of a value and decide whether it is a GC reference or not, since it
could be a integer that just happens to look like a GC reference

 >> 3. (Related to the previous point) Arbitrary GEPs off of "valid GC
 >>     references" are also "valid GC references".  This includes out of
 >>     bounds GEPs (whose bitwise representation can have any value, for
 >>     all practical intents and purposes).
 > Fair enough, and important to point out.
 >> [Aside: We could phrase the invariant in terms of fixed safepoint
 >> locations, but I think focusing on this (stronger) invariant actually
 >> makes the problem simpler (and no harder).]
 >> The invariant is somewhat vague and relies on white-box knowledge of
 >> the GC to define what values "need to be reported to the GC" (this
 >> will become clearer as we see some examples); and the "problem
 >> statement" is to come up with a precise set of rules governing the
 >> GCREF type so that the invariants we need fall out.  Today we enforce
 >> the invariants downstream by `#if 0` ing out code that breaks it.
 >> However that's fragile; and more importantly, it does not help other
 >> people interested in precise GC and LLVM.
 > Good motivation for getting this proposal in langref,
 >> # Examples of Bad Transforms
 >> Let's look at some some examples of bad transforms that are legal
 >> today (in upstream LLVM) with GCREF == "addrspace(1)*".  Right now
 >> we'll justify their "badness" by invoking ad-hoc GC specific rules on
 >> a case by case basis:
 > I know non-zero address space has “target semantics", but I have no 
idea what that means to the optimizer today. Is there some class of IR 
level optimization that is prohibited at address space>  0?

Not that I'm aware of.

 >> ## Integer ->  GCREF Conversion
 >> (%value is unused)
 >> %value = load i64, i64* %ptr
 >> == Transformed to ==>
 >> %value = load GCREF, GCREF* (bitcast %ptr)
 >> This is a problem since we've introduced a value of type GCREF that
 >> may not be a valid GC reference at runtime.  If the compiler can prove
 >> that %value is actually a valid GC reference then this transform is
 >> valid, but typically it won't be able to except in cases like "%value
 >> is provably 0" in which case it is better to just fold the load away.
 >> %value = load i64, i64* %ptr
 >> == Transformed to ==>
 >> %value = load i64, i64* %ptr
 >> %value_gc = inttoptr %value to GCREF  (%value_gc unused)
 >> is invalid for the same reason.
 > Ok. I know some passes may think it’s fine to “pass values through 
memory" and bitcast the source and dest. Are there specific passes 
in-tree that you’ve seen do this? Are they just passing values through 
alloca? We shouldn’t need to worry about general heap access doing this. 
Is there any other reason a pass would load an address as i64?

InstCombine today will transform

   %v = load i32*, i32** %ptr0
   store i32* %v, i32** %ptr1


   %1 = bitcast i32** %ptr0 to i64*
   %v1 = load i64, i64* %1, align 8
   %2 = bitcast i32** %ptr1 to i64*
   store i64 %v1, i64* %2, align 8

 >> ## GCREF ->  Integer conversion
 >> Converting a GCREF to an integer is fine if the integer is unused.
 >> However, cases like these are problematic:
 >> %v0 = load GCREF, GCREF* %ptr_0
 >> %v1 = load GCREF, GCREF* %ptr_1
 >> %cmp = icmp eq %v0, %v1
 >> == Transformed to ==>
 >> %v0 = load i64, i64* %ptr_0
 >> %v1 = load i64, i64* %ptr_1
 >> %cmp = icmp eq %v0, %v1
 >> Since in the second version if "L" is, say, between %v0 and %v1, "G"
 >> would not include "v0" which contains a GC reference that the GC may
 >> want to scan or relocate[0].
 >> For similar reasons
 >> %v0 = load GCREF, GCREF* %ptr_0
 >> %v1 = load GCREF, GCREF* %ptr_0
 >> %cmp = icmp eq %v0, %v1
 >> == Transformed to ==>
 >> %v0 = load GCREF, GCREF* %ptr_0
 >> %v0.i = ptrtoint %v0
 >> %v1 = load GCREF, GCREF* %ptr_0
 >> %v1.i = ptrtoint %v1
 >> %cmp = icmp eq %v0.i, %v1.i
 >> is also invalid.
 > Converting any GCREF to an integer is BAD, period. A single benign 
use could float below a safepoint. Imagine if you expanded card marks 
early before step #3!

Here I'm trying to constrain the optimizer -- if the frontend wants to
insert ptrtoints and knows what it is doing (because it has GC
specific knowledge), then that should be fine.

 > Again, is this real or hypothetical? Who is doing this and why?

I haven't seen this happen organically (so this is likely
hypothetical), but I want to prevent it from happening even if there
is a reason to do it.

 >> ## Round tripping GCREF's through Integers
 >> The following transform is invalid:
 >> %v0 = load GCREF, GCREF* %loc0
 >> store GCREF %v0, GCREF* %loc1
 >> == Transformed to ==>
 >> %v0 = load i64, i64* (bitcast %loc0)
 >> store i64 %v0, i64* (bitcast %loc1)
 >> because if "L" in the second version is between the load and the
 >> store, "G" will not include `%v0` even though we need to report it to
 >> the GC (otherwise we could propagate a stale bitwise representation of
 >> the GC reference we loaded in `%v0`).
 > Just like the last two cases, converting a GCREF to an integer is bad.
 > Real or hypothetical?

This is in instcombine.

 >> ## Bad types due to hoisting loads out of control flow
 >> This is bad
 >> if (is_reference) {
 >>   %val = load GCREF, GCREF* %ptr
 >> }
 >> == Transformed to ==>
 >> %val = load GCREF, GCREF* %ptr
 >> if (is_reference) {
 >> }
 >> unless the compiler can prove that %val is a GC reference at its new
 >> location.  Downstream we model the Java type system in LLVM IR to try
 >> to prove that %val is a GC reference in its new location, but for
 >> starters we will have to be conservative upstream.
 > This is really abstract. Why would the load be hoisted if %ptr is not 
dereferenceable? Why would it be dereferenceable if GCRef is not valid?
 > I think this has to do with type checks. You could have a valid 
object, but the GCRef field may only be valid under some type check.

Yes, in practice the "is_reference" check will be a type check.  You
can end up with this from Java code that looks like:

class A {
   long longField;  // Offset 16, say

class B {
   Object objField;  // Offset 16 also

void f(A a) {
   Object o = a;

   // Since a is of type A, we know that it is dereferenceable up to
   // 16+8 == 24 bytes, but it does not mean that we can speculate
   // the load of objField.

   // This is kind of silly as written, but this does happen after
   // inlining.

   if (o instanceof B) {
     // This is dead code, and we will often DCE it away, but we're
     // not allowed to rely on that happening (before hoisting) for
     // correctness.
     Object obj = ((B) o).objField;

 > It is definitely important to speculate these loads, but I agree that 
it needs to be done with special logic that understands GCRef.

 > In particular, you can speculate one such load, but never a dependent 

We can't typically speculate dependent loads, but that's because we
can't (typically) prove dependent loads as loading from non-null
locations.  E.g. in

class A {
   Object o;

class B {
   A a;
   void g() {
     if (cond) {

We could speculate the load of .o if this.a could be proved to be

 > You could solve the dominance problem by introducing some garbage 
value on the ‘else’ path. But then your simple invariants don’t hold.

Not sure what you mean by "simple invariants don’t hold", but in the
example above "void f(A a) {", putting junk on the else path does not

 > This is optimization is probably best done contained in step #3, 

I don't think that's a good idea -- we'll have to replicate LICM
inside RewriteStatepointsForGC, for instance, when we really want it
to run early.  Without LICM of gc references we can't get rid of range
checks in simple loops like

   for (int i = 0; i < this.a.length; i++)
     this.a[i] = 0;

 > Alternatively, you somehow make the dereferenceable attribute apply 
to individual fields. Hasn't some sort of “assume dereferenceable” 
intrinsic been proposed? You would still left with the problem that you 
want to speculate past one level of type check.

That's not sufficient for the "void f(A a) {" example -- the field at
offset 16 is dereferenceable for "a", but it can't be loaded as a

 >> # Proposed Solution:
 >> We introduce a "new" LLVM type.  I will still refer to it as GCREF
 >> here, but it may actually still be "<ty>  addrspace(k)*" where k is
 >> specially noted in the datalayout.
 > Sounds good to make this target-configurable.
 >> Semantics:
 >> 1. GCREF represents an equivalence class of values (equivalence
 >>     relation being "points to a fixed semantic object").  The bitwise
 >>     representation fluctuates constantly outside the compiler's
 >>     control (the dual of `undef`), but may have invariants (in
 >>     particular, we'd like to be able to specify alignment, nonnull
 >>     etc.).  At any given point in time all GCREF instances pointing to
 >>     the same object have the same bitwise representation (we need this
 >>     to make `icmp eq` is well-defined).
 > OK.
 >> 2. GCREF instances can only contain a valid gc reference (otherwise
 >>     they can't meaningfully "fluctuate" among the various possible
 >>     bitwise representations of a reference).
 > Lost me.

The "runtime" comes in and modifies values of type GCREF to contain
different bitwise representations pointing to the same logical object.
It can't do this rewriting if a GCREF value contains some random
bitwise pattern, since that won't point to a valid object on the heap.

 >> 3. Converting GCREF to integers is fine in general, but you'll get an
 >>     arbitrary "snapshot" of the bitwise value that will generally not
 >>     be meaningful (unless you are colluding with the GC in
 >>     implementation defined ways).

 > I see what you’re getting at but don’t like the wording. Converting 
GCRef to integers is not “fine”. It’s not even meaningful. It just isn’t 
undefined behavior at the point of conversion.

This is really intended to support "object pinning", where the
frontend converts a GC reference to an integer and uses it as such.
The key phrase above is "colluding with the GC" -- you need whitebox
knowledge of how the GC works to be able to do convert GCREFs to
integers reliably.

 >> 4. Converting integers to GCREF is allowed only if source integer is
 >>     a bitwise representation of a valid GC reference that is not an
 >>     out of bounds derived reference.  However, this is difficult for
 >>     the compiler to infer since it typically will have no fundamental
 >>     knowledge of what bitwise representation can be a valid GC
 >>     reference.
 > I don’t know about this. It seems like the integer’s lifetime could 
be extended across a safepoint.

I've explicitly not talked about safepoints at all here -- GCREF
values always have to hold "valid" references, and what a valid
reference is changes at arbitrary points in time (and at this layer
there are no "safepoints" denoting the only places where the
definition of a valid reference can change).  This means, in general,
you can not for sure say that an integer contains a bitwise value that
is a valid GC reference, and even if you have one at a certain point
in time that it will stay valid in the next instant.

 >> 5. Operations that use a GCREF-typed value are "atomic" in using the
 >>     bitwise representation, i.e., loading from a GCREF typed value
 >>     does not "internally" convert the GCREF to a normal
 >>     integer-pointer and then use the integer-pointer, since that would
 >>     mean there is a window in which the integer-pointer can become
 >>     stale[1].
 > Yes, of course. It’s a good general rule for the optimizer. I wish 
any pass that violates atomic reads/writes of pointers had a good reason 
and some general way to disable it.
 >> 6. A GCREF stored to a location in the heap continues to fluctuate,
 >>     and keeps itself in sync with the right bitwise representation.
 >>     In a way, there isn't a large distinction between the GC and the
 >>     heap -- the heap is part of (or managed by) the GC.
 >> I think (6) is the most controversial of the semantics above, but it
 >> isn't very different from how `undef` stored to the heap remains
 >> `undef` (i.e. a non-deterministic N-bit value) and a later load can
 >> recover `undef` instead of getting a normal N-bit value.
 > I think I know what you mean.
 >> ## How do we fare with the examples?
 >> Let's take a look some of the examples, and see how the semantics
 >> specified above help us:
 > Is case #0, Integer ->  GCREF Conversion, missing or covered below?

That's basically (4).

 >> ### Demoting loads / stores from GCREF to integers is invalid (case 1):
 >> %v0 = load GCREF, GCREF* %ptr_0
 >> %v1 = load GCREF, GCREF* %ptr_1
 >> %cmp = icmp eq %v0, %v1
 >> ==>
 >> %v0 = load i64, i64* %ptr_0
 >> %v1 = load i64, i64* %ptr_1
 >> %cmp = icmp eq %v0, %v1
 >> Invalid because the initial IR is comparing two GC fluctuating
 >> references, while the the transformed IR will compare their bitwise
 >> snapshot taken at an arbitrary point in time.
 > This seems unnecessary to reason about since any use of a GCRef’s 
bits is not meaningful at any point in the program later than its 
immediate use.

Yes, that's what I'm trying to say above. :)

 >> ### Demoting loads / stores from GCREF to integers is invalid (case 2):
 >> %v0 = load GCREF, GCREF* %loc0
 >> store GCREF %v0, GCREF* %loc1
 >> ==>
 >> %v0 = load i64, i64* (bitcast %loc0)
 >> store i64 %v0, i64* (bitcast %loc1)
 >> Invalid because there are two semantic differences between the initial
 >> and final IR:
 >> 1. In the initial IR we store a fluctuating and up-to-date GCREF,
 >>     whereas in the final IR we store a potentially stale bitwise
 >>     snapshot of the value.
 >> 2. In the initial IR, after the store *%loc1 has a fluctuating GC
 >>     reference, while in the final IR *%loc1 only has a once-valid
 >>     snapshot of a GC reference (that will no longer fluctuate).
 > This is interesting to discuss, but still just falls under the GCRef 
cannot be converted to and used as an integer rule.


 >> ### Basic store forwarding is valid:
 >> store  GCREF %v0, GCREF* %loc0
 >> %v1 = load GCREF, GCREF* %loc0
 >> ==>
 >> store GCREF %v0, GCREF* %loc0
 >> %v1 = %v0
 >> The store forwarding is valid since we stored a GCREF that
 >> semantically points to object O (say), and loaded back a GCREF
 >> pointing to the same object O.
 > Good.
 >> ### Hoisting inttoptr is (generally) invalid:
 >>   if (<condition>) {
 >>     %val_gc = ptrtoint %val to GCREF
 >>   }
 >> ==>
 >>   %val_gc = ptrtoint %val to GCREF
 >>   if (<condition>) {}
 >> is invalid since we do not know that `%val` is a valid bitwise
 >> representation of a GCREF at the point in time we do the `ptrtoint`
 >> (i.e.<<condition>>  could be controlling whether `%val` is a valid
 >> bitwise representation of a GCREF).
 >> In theory this isn't different from loads, divisions or any other
 >> potentially trapping operation, but I think a lot of LLVM directly
 >> encodes the fact that `CastInst` s are obviously speculatable, so
 >> maybe Integer ->  GCREF casts should be outlawed at the type system
 >> level, and be done via an intrinsic?

 > Is this supposed to be 'inttoptr’? I don’t see how this can work at 
all, since the integer’s lifetime can extend beyond a safepoint.

Do you mean the initial example?  That is fine if <condition> is
always false at runtime.

 > Do you really need to support constant GCRefs? Cliff Click said it 
was a bad decision.

Typically we don't need inttoptr to GCREFs, but other runtimes might.
I'm happy to not support it for now.

 >> ### Store forwarding between integers and GCREFs is valid
 >> store i64 %val, i64* %loc
 >> %val = load GCREF, GCREF* (bitcast %loc)
 >> ==>
 >> store i64 %val, i64* %loc
 >> %val = inttoptr %val to GCREF
 >> is valid.
 >> Both the initial and the final program assume that %val is a valid
 >> bitwise representation of a GC reference.
 >> (Note: it may not be valid to speculate the inttoptr instruction above
 >> control flow).
 > Why do you need this? As in the previous case, it doesn’t seem like a 
sound representation.

We don't need this -- the only realistic place this can happen in is
in dead code.

 >> ### Load forwarding between integers and GCREFs is invalid
 >> %val_i = load i64, i64* %loc
 >> %val_p = load GCREF, GCREF* (bitcast %loc)
 >> ==>
 >> %val_i = load i64, i64* %loc
 >> %val_p = inttoptr %val_i to GCREF
 >> is invalid if *%loc contains a GC reference.  Given the model that we
 >> have, the first program loads some bitwise representation of
 >> fluctuating a GC reference, and then loads the same GC reference as a
 >> GCREF; whereas the second program tries to convert a potentially stale
 >> bitwise representation of a GC reference into a fluctuating GCREF
 >> (which is not allowed).
 > I think the original code is unspecified, maybe undefined at the 
point of %val_i’s use.

I'd say that would depend on what use.  For instance, say we have:

class A {
   long lField;  // Offset 16

class B {
   Object oField;  // Offset 16

void h(B b) {
   Object o = b;
   hC = b.oField.hashCode();
   if (o instanceof A) {
     print(((A)o).lField + 20);

it is reasonable to optimize "h" to:

void h(i8* b) {
   GCREF o = *(b + 16)
   long l  = *(b + 16)
   long l2 = l + 20;
   if (is_instance_of(b, A_class)) {

Above the use of l in "l + 20" has to be fine, and not UB.

 >> ## Implementation Issues&  Impact to LLVM
 >> This is a proposed path forward to implementing GCREF in LLVM.  The
 >> caveat mentioned at the start of the email (everything is open for
 >> discussion) especially applies here.
 >> ### Changes to IR to represent GCREF
 >> We change `datalayout` to denote pointers of a specific address space
 >> as GCREFs.  We add verifier rules forbidding `inttoptr` conversion of
 >> integers into GCREFs and add an intrinsic (that can be control
 >> dependent) that converts integers to GCREFs.

 > Sounds good to me. If you really need Int->GCRef conversion, then it 
seems like it needs to be an intrinsic with side effects, contrary to 
many of your examples.

Yes.  The other solution is to make `inttoptr` a side effect depending
on the result type, but that sounds like a bad idea.

 > Or are you thinking that this intrinsic is only generated after 
lowering GCRef stores in step #4?

 > What about store barriers? Do they need an intrinsic or just 
`ptrtoint %gcref to i64`?

As long as stores are correctly typed (i.e. a store of GCREF type has
not been converted to a store of type i64, like instcombine does
today), I think it is sufficient to late-rewrite stores into using
a store barrier based on the `gc_ref_to_int` intrinsic that we'll add.

 > This is probably covered somewhere else, but what are the semantics 
of your statepoints with regard to reading/writing memory?

This is what I have in mind: gc.statepoint only exists after we've
gotten rid of GCREFs -- they model the behavior of GCREF values in
terms of normal pointers that LLC knows how to codegen.  The semantics
of gc.statepoint are specified (roughly) as:

(r0, r1, ... rn) = gc.statepoint(p0, p1, ... pn)

is equivalent to

   1. for each object o in the heap:
        o_new = malloc(size of o)
        *o_new = *o
        rewrite all refs to o to point to o_new

   2. for each object p_i in (p0, p1, ... pn):
        set r_i to the the location p_i was relocated to

 >> ### Changes to the optimizer
 >> We change the optimizer to
 >> - Not speculate loads of GCREF type
 >> - Not change uses of GCREF types to uses of non-GCREF types
 >> Locally we already have changes fixing most of the places that'd need
 >> to be fixed for the above to hold.
 > You may be answering this elsewhere in the thread, but are we sure 
the GC load shouldn’t be an intrinsic?

That's what others are leaning towards.  I'm somewhat hesitant to do
that since it will involved changing almost every place that handles
LoadInst to also handle the gc_load intrinsic (and would be a bad
thing for the codebase long term), but I'm not 100% opposed to it if
that's what we need to move forward.

 >> ### Lowering
 >> We do not teach LLC about GCREF's at all (in fact, we can change it to
 >> assert that the datalayout does not contain a `gc` directive).  We
 >> make the RewriteStatepointForGC pass rewrite the IR to make
 >> relocations explicit (so that no more GCREF-like semantics are
 >> necessary) and remove the GC directive from the Module's datalayout.
 > Ok. Is datalayout normally supposed to be used to convey IR lowering 
semantics? Or should it be immutable per target?
 > During statepoint lowering, I guess the IR does not change in any 
locally recognizable way. Pointer types are all still addresspace(1) 
right? Changing the datalayout to change the meaning of those types 
makes sense to me if it’s ok with everyone else.

I see what you're getting at here -- changing the datalayout does seem
a little hacky.  Maybe there is an efficient way to remap types
(without deleting and re-creating every instruction def'ing and using

 >> [1]:
 >> In reality, this is handled by _not_ putting safepoints in problematic
 >> locations.  For instance, if lowering a CompareAndSwap in some weird
 >> architecture would involve doing:
 >>   // CAS(From, To, Location)
 >>   Location ^=<<  Constant>>
 >>   // The GC cannot parse the state at this location "L", since we've
 >>   // done some arbitrary arithmetic on Location.
 >>   ... Do some magic with Location, To, From
 >> then we _cannot_ put a safepoint at location "L”.
 > I honestly don’t know what this footnote means at the LLVM IR level. 
I don’t think GCRef’s should exist as integer-types values prior to 
statepoint lowering, or you would have to convince me of why it’s 
necessary and safe.

It isn't necessary for us, and is safe only for runtimes that
support object pinning.

More information about the llvm-dev mailing list