[llvm-dev] RFC: Strong GC References in LLVM
Eli Friedman via llvm-dev
llvm-dev at lists.llvm.org
Mon Jul 11 15:44:44 PDT 2016
On Mon, Jul 11, 2016 at 2:28 PM, Sanjoy Das <sanjoy at playingwithpointers.com>
> Sanjoy Das wrote:
# Proposed Solution:
>> We introduce a "new" LLVM type. I will still refer to it as GCREF
>> here, but it may actually still be "<ty> addrspace(k)*" where k is
>> specially noted in the datalayout.
>> 1. GCREF represents an equivalence class of values (equivalence
>> relation being "points to a fixed semantic object"). The bitwise
>> representation fluctuates constantly outside the compiler's
>> control (the dual of `undef`), but may have invariants (in
>> particular, we'd like to be able to specify alignment, nonnull
>> etc.). At any given point in time all GCREF instances pointing to
>> the same object have the same bitwise representation (we need this
>> to make `icmp eq` is well-defined).
>> 2. GCREF instances can only contain a valid gc reference (otherwise
>> they can't meaningfully "fluctuate" among the various possible
>> bitwise representations of a reference).
>> 3. Converting GCREF to integers is fine in general, but you'll get an
>> arbitrary "snapshot" of the bitwise value that will generally not
>> be meaningful (unless you are colluding with the GC in
>> implementation defined ways).
>> 4. Converting integers to GCREF is allowed only if source integer is
>> a bitwise representation of a valid GC reference that is not an
>> out of bounds derived reference. However, this is difficult for
>> the compiler to infer since it typically will have no fundamental
>> knowledge of what bitwise representation can be a valid GC
>> 5. Operations that use a GCREF-typed value are "atomic" in using the
>> bitwise representation, i.e., loading from a GCREF typed value
>> does not "internally" convert the GCREF to a normal
>> integer-pointer and then use the integer-pointer, since that would
>> mean there is a window in which the integer-pointer can become
>> 6. A GCREF stored to a location in the heap continues to fluctuate,
>> and keeps itself in sync with the right bitwise representation.
>> In a way, there isn't a large distinction between the GC and the
>> heap -- the heap is part of (or managed by) the GC.
>> I think (6) is the most controversial of the semantics above, but it
>> isn't very different from how `undef` stored to the heap remains
>> `undef` (i.e. a non-deterministic N-bit value) and a later load can
>> recover `undef` instead of getting a normal N-bit value.
I'm not really convinced that the GCREF type is really necessary...
consider an alternate model:
1. A GCREF is never loaded into a register; it's either on the heap, or in
2. Add an intrinsic gcref.copy which copies a gcref between two allocas.
3. Add intrinsics gcref.load_gcref(GCREF*, GCREF*, offset) and
gcref.store_gcref(GCREF*, GCREF*, offset, value) which load and store a
gcref through a gcref.
4. Add intrinsics gcref.load_value(GCREF*, offset) and
gcref.store_value(GCREF*, offset, value) which load and store normal values
5. The statepoint lowering pass gets rid of the allocas.
Keeping GCREFs exclusively in memory means the LLVM optimizer will handle
them conservatively, but correctly.
I guess the problem with this is precisely that the LLVM optimizer will
handle them conservatively... but on the flip side, I think you're going to
end up chasing down weird problems forever if a "load" from an alloca has
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev