[llvm-dev] RFC: Strong GC References in LLVM

Mon Jul 11 15:44:44 PDT 2016

On Mon, Jul 11, 2016 at 2:28 PM, Sanjoy Das <sanjoy at playingwithpointers.com>
wrote:

> ping!
>
> Sanjoy Das wrote:
>
# Proposed Solution:
>>
>> We introduce a "new" LLVM type.  I will still refer to it as GCREF
>> here, but it may actually still be "<ty>  addrspace(k)*" where k is
>> specially noted in the datalayout.
>>
>> Semantics:
>>
>>   1. GCREF represents an equivalence class of values (equivalence
>>      relation being "points to a fixed semantic object").  The bitwise
>>      representation fluctuates constantly outside the compiler's
>>      control (the dual of `undef`), but may have invariants (in
>>      particular, we'd like to be able to specify alignment, nonnull
>>      etc.).  At any given point in time all GCREF instances pointing to
>>      the same object have the same bitwise representation (we need this
>>      to make `icmp eq` is well-defined).
>>
>>   2. GCREF instances can only contain a valid gc reference (otherwise
>>      they can't meaningfully "fluctuate" among the various possible
>>      bitwise representations of a reference).
>>
>>   3. Converting GCREF to integers is fine in general, but you'll get an
>>      arbitrary "snapshot" of the bitwise value that will generally not
>>      be meaningful (unless you are colluding with the GC in
>>      implementation defined ways).
>>
>>   4. Converting integers to GCREF is allowed only if source integer is
>>      a bitwise representation of a valid GC reference that is not an
>>      out of bounds derived reference.  However, this is difficult for
>>      the compiler to infer since it typically will have no fundamental
>>      knowledge of what bitwise representation can be a valid GC
>>      reference.
>>
>>   5. Operations that use a GCREF-typed value are "atomic" in using the
>>      bitwise representation, i.e., loading from a GCREF typed value
>>      does not "internally" convert the GCREF to a normal
>>      integer-pointer and then use the integer-pointer, since that would
>>      mean there is a window in which the integer-pointer can become
>>      stale[1].
>>
>>   6. A GCREF stored to a location in the heap continues to fluctuate,
>>      and keeps itself in sync with the right bitwise representation.
>>      In a way, there isn't a large distinction between the GC and the
>>      heap -- the heap is part of (or managed by) the GC.
>>
>> I think (6) is the most controversial of the semantics above, but it
>> isn't very different from how `undef` stored to the heap remains
>> `undef` (i.e. a non-deterministic N-bit value) and a later load can
>> recover `undef` instead of getting a normal N-bit value.
>>
>
I'm not really convinced that the GCREF type is really necessary...
consider an alternate model:

1. A GCREF is never loaded into a register; it's either on the heap, or in
an alloca.
2. Add an intrinsic gcref.copy which copies a gcref between two allocas.
3. Add intrinsics gcref.load_gcref(GCREF*, GCREF*, offset) and
gcref.store_gcref(GCREF*, GCREF*, offset, value) which load and store a
gcref through a gcref.
4. Add intrinsics gcref.load_value(GCREF*, offset) and
gcref.store_value(GCREF*, offset, value) which load and store normal values
a gcref.
5. The statepoint lowering pass gets rid of the allocas.

Keeping GCREFs exclusively in memory means the LLVM optimizer will handle
them conservatively, but correctly.

I guess the problem with this is precisely that the LLVM optimizer will
handle them conservatively... but on the flip side, I think you're going to
end up chasing down weird problems forever if a "load" from an alloca has
side-effects.

-Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160711/b628a523/attachment.html>