[llvm-dev] RFC: Strong GC References in LLVM
Chandler Carruth via llvm-dev
llvm-dev at lists.llvm.org
Mon Jul 11 15:56:46 PDT 2016
On Mon, Jul 11, 2016 at 3:44 PM Eli Friedman <eli.friedman at gmail.com> wrote:
> On Mon, Jul 11, 2016 at 2:28 PM, Sanjoy Das <
> sanjoy at playingwithpointers.com> wrote:
>
>> ping!
>>
>> Sanjoy Das wrote:
>>
> # Proposed Solution:
>>>
>>> We introduce a "new" LLVM type. I will still refer to it as GCREF
>>> here, but it may actually still be "<ty> addrspace(k)*" where k is
>>> specially noted in the datalayout.
>>>
>>> Semantics:
>>>
>>> 1. GCREF represents an equivalence class of values (equivalence
>>> relation being "points to a fixed semantic object"). The bitwise
>>> representation fluctuates constantly outside the compiler's
>>> control (the dual of `undef`), but may have invariants (in
>>> particular, we'd like to be able to specify alignment, nonnull
>>> etc.). At any given point in time all GCREF instances pointing to
>>> the same object have the same bitwise representation (we need this
>>> to make `icmp eq` is well-defined).
>>>
>>> 2. GCREF instances can only contain a valid gc reference (otherwise
>>> they can't meaningfully "fluctuate" among the various possible
>>> bitwise representations of a reference).
>>>
>>> 3. Converting GCREF to integers is fine in general, but you'll get an
>>> arbitrary "snapshot" of the bitwise value that will generally not
>>> be meaningful (unless you are colluding with the GC in
>>> implementation defined ways).
>>>
>>> 4. Converting integers to GCREF is allowed only if source integer is
>>> a bitwise representation of a valid GC reference that is not an
>>> out of bounds derived reference. However, this is difficult for
>>> the compiler to infer since it typically will have no fundamental
>>> knowledge of what bitwise representation can be a valid GC
>>> reference.
>>>
>>> 5. Operations that use a GCREF-typed value are "atomic" in using the
>>> bitwise representation, i.e., loading from a GCREF typed value
>>> does not "internally" convert the GCREF to a normal
>>> integer-pointer and then use the integer-pointer, since that would
>>> mean there is a window in which the integer-pointer can become
>>> stale[1].
>>>
>>> 6. A GCREF stored to a location in the heap continues to fluctuate,
>>> and keeps itself in sync with the right bitwise representation.
>>> In a way, there isn't a large distinction between the GC and the
>>> heap -- the heap is part of (or managed by) the GC.
>>>
>>> I think (6) is the most controversial of the semantics above, but it
>>> isn't very different from how `undef` stored to the heap remains
>>> `undef` (i.e. a non-deterministic N-bit value) and a later load can
>>> recover `undef` instead of getting a normal N-bit value.
>>>
>>
> I'm not really convinced that the GCREF type is really necessary...
> consider an alternate model:
>
> 1. A GCREF is never loaded into a register; it's either on the heap, or in
> an alloca.
> 2. Add an intrinsic gcref.copy which copies a gcref between two allocas.
> 3. Add intrinsics gcref.load_gcref(GCREF*, GCREF*, offset) and
> gcref.store_gcref(GCREF*, GCREF*, offset, value) which load and store a
> gcref through a gcref.
> 4. Add intrinsics gcref.load_value(GCREF*, offset) and
> gcref.store_value(GCREF*, offset, value) which load and store normal values
> a gcref.
> 5. The statepoint lowering pass gets rid of the allocas.
>
> Keeping GCREFs exclusively in memory means the LLVM optimizer will handle
> them conservatively, but correctly.
>
> I guess the problem with this is precisely that the LLVM optimizer will
> handle them conservatively... but on the flip side, I think you're going to
> end up chasing down weird problems forever if a "load" from an alloca has
> side-effects.
>
I think everything but this last weird aspect we already get from address
spaces.
I misread the proposal originally and didn't understand that the problem
was loading from an alloca *holding* the GC pointer, and thus it was a
normal and boring load that somehow has to have side-effects.
I fundamentally think that we can't do that. I can see several ways to make
the result work without that.
- Teach the statepoint rewriting to handle hoisted loads in some way
(haven't thought too much about how feasible this is)
- Tell LLVM that the load has this weird control dependence with some
mechanism (make it a special gc load intrinsic, or a volatile load, or ....)
-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160711/313e8daa/attachment.html>
More information about the llvm-dev
mailing list