[llvm-dev] RFC: Strong GC References in LLVM
Sanjoy Das via llvm-dev
llvm-dev at lists.llvm.org
Tue Jul 12 16:42:33 PDT 2016
Hi Andy,
Andrew Trick wrote:
>
> I don’t remember why this is important. I often don't know with
> InstCombine whether something is needed to expose IR optimization or an
> ISEL pattern (which should really be somehow denoted and only run in a
> later pass).
But for the purposes of this discussion, only the legality (or lack
thereof) of the above transform matters, not whether it is profitable
or not.
>> > Converting any GCREF to an integer is BAD, period. A single benign
>> use could float below a safepoint. Imagine if you expanded card marks
>> early before step #3!
>>
>> Here I'm trying to constrain the optimizer -- if the frontend wants to
>> insert ptrtoints and knows what it is doing (because it has GC
>> specific knowledge), then that should be fine.
>
> Sure, if the frontend has some way to guarantee that statepoint
> insertion can’t happen between ptrtoint and its uses. But IR passes
> should never generate a ptrtoint from a GCRref, regardless of how it is
> used.
> You want very simple rules for pass writers that are also
> verifiable.
> If the frontend wants to use ptrtoint on a GCRef it could
> use an instrinsic for that purpose that is exempt from the GCRef
verifier.
Agree on all three counts.
>> We could speculate the load of .o if this.a could be proved to be
>> non-null.
>>
>> > You could solve the dominance problem by introducing some garbage
>> value on the ‘else’ path. But then your simple invariants don’t hold.
>>
>> Not sure what you mean by "simple invariants don’t hold", but in the
>> example above "void f(A a) {", putting junk on the else path does not
>> help.
>
> For that to work, you would have to know where potential statepoints
> might be, which is why it breaks the invariant that GCRefs are always
> valid, and why I proposed only doing this within the safepoint insertion
> pass.
>
>> > This is optimization is probably best done contained in step #3,
>> rewriting.
>>
>> I don't think that's a good idea -- we'll have to replicate LICM
>> inside RewriteStatepointsForGC, for instance, when we really want it
>> to run early. Without LICM of gc references we can't get rid of range
>> checks in simple loops like
>>
>> for (int i = 0; i < this.a.length; i++)
>> this.a[i] = 0;
>
> LICM isn’t speculation, unless its looking at the dereferenceable
> attribute and the loop has conditions.
Yes, that was a poor example, a better example would be one with
conditions, as you suggest:
for (int i = 0; i < this.a.length; i++)
if (cond)
this.a[i] = 0;
> I thought your proposed solution was to *never* speculate loads of
> GCRefs, which is strictly worse than what I’m suggesting.
My proposal to never speculate loads of GCRefs is only a starting
point. Downstream we have analyses that use Java-specific
knowledge to infer type information (on a best-effort basis, but this
works fairly well in practice), and can use that to prove e.g. that a
certain value is of type java.lang.Foo in the loop preheader, so a
GCREF-load from offset 128 (say) can be legally issued in the loop
preheader.
>> > Alternatively, you somehow make the dereferenceable attribute apply
>> to individual fields. Hasn't some sort of “assume dereferenceable”
>> intrinsic been proposed? You would still left with the problem that
>> you want to speculate past one level of type check.
>>
>> That's not sufficient for the "void f(A a) {" example -- the field at
>> offset 16 is dereferenceable for "a", but it can't be loaded as a
>> GCREF.
>
> I think there is an important design question here, because I’ve also
> heard grumbling about LLVM’s lack of condition dependence tracking
> outside of the GCRef context.
>
> +CC Daniel Berlin
>
> If you can come up with an IR design for tracking your GCRef load
> depedencies (usually on specific null and type checks), then that could
That sounds orthogonal to this proposal -- as a starting point we need
a simple way to mark a GCRef load as control dependent on every
condition executed since the start of the program. Something like
cast-to-dereferenceable can possibly be added once we start
upstreaming analyses that prove that a GCRef load is safe to
speculate, but that's for better performance not correctness.
Having said that, I'll keep the "explicit control dependence" idea in
mind -- maybe there is way to find "a simple way to mark a GCRef load
as control dependent on every condition executed since the start of
the program" that can be generalized to remember more precise control
dependence later.
> solve some of the other difficult problems that others are wrestling
> with (sorry I don’t have a pointer to the discussions). This probably
> means inserting some cast-to-dereferenceable instruction on the address
> generation def-use chain.
>
> Otherwise, you have to either impose some subtle rules on IR
> optimization passes, or make all GCRef loads intrinsics, which is also a
> maintenance burden.
>> The "runtime" comes in and modifies values of type GCREF to contain
>> different bitwise representations pointing to the same logical object.
>> It can't do this rewriting if a GCREF value contains some random
>> bitwise pattern, since that won't point to a valid object on the heap.
>
> Ok, but you also said the derived GCRef can be any bit pattern due to
> out-of-bounds addressing. I think there is a constraint that the offset
> from the GC root can be determined via a straightforward analysis.
You're right -- that's missing in the specification (derived pointer
provenance), I'll add that in in the next iteration.
>> I've explicitly not talked about safepoints at all here -- GCREF
>> values always have to hold "valid" references, and what a valid
>> reference is changes at arbitrary points in time (and at this layer
>> there are no "safepoints" denoting the only places where the
>> definition of a valid reference can change). This means, in general,
>> you can not for sure say that an integer contains a bitwise value that
>> is a valid GC reference, and even if you have one at a certain point
>> in time that it will stay valid in the next instant.
>
> Well, it would be much much easier to understand and verify if inttoptr
> were simply illegal. It’s confusing because the integer is defined
> somewhere else, yet it needs to be a valid GCRef at the point of use and
> you don’t want to allow GCRefs to live in integer values.
>
> As with ptrtoint, this seems like it would be better as a separate
> intrinsic if some hypothetical frontend needs it.
I think this is the most straightforward path forward ^.
> If this is allowed, doesn’t the integer need to be a GCRoot so a simple
> analysis can determine the offset?
>> > Is this supposed to be 'inttoptr’? I don’t see how this can work at
>> all, since the integer’s lifetime can extend beyond a safepoint.
>>
>> Do you mean the initial example? That is fine if <condition> is
>> always false at runtime.
>
> The title of the example says “inttoptr”…
Yes, that's just a typo in the code example; sorry.
>> > Do you really need to support constant GCRefs? Cliff Click said it
>> was a bad decision.
>>
>> Typically we don't need inttoptr to GCREFs, but other runtimes might.
>> I'm happy to not support it for now.
>
> Again, this may be better served with a gcref_to_int intrinsic that
> cannot be hoisted.
Yes.
>>
>> Above the use of l in "l + 20" has to be fine, and not UB.
>
> Good example.
>
> Of course, it could be just another instance of a “no inttoptr/ptrtoint”
> rule.
That should work.
>
>> >> ## Implementation Issues& Impact to LLVM
>> >>
>> >> This is a proposed path forward to implementing GCREF in LLVM. The
>> >> caveat mentioned at the start of the email (everything is open for
>> >> discussion) especially applies here.
>> >>
>> >> ### Changes to IR to represent GCREF
>> >>
>> >> We change `datalayout` to denote pointers of a specific address space
>> >> as GCREFs. We add verifier rules forbidding `inttoptr` conversion of
>> >> integers into GCREFs and add an intrinsic (that can be control
>> >> dependent) that converts integers to GCREFs.
>
> This intrinsic sounds like what I’m suggesting, except that I’m
> suggesting that the frontend can hypothetically generate the intrinsic
> if needed, and rather than prohibiting `inttoptr/ptrtoint` conversion,
> you outright inhibit it’s existence, which is trivial to verify.
Sounds good.
>> > Sounds good to me. If you really need Int->GCRef conversion, then it
>> seems like it needs to be an intrinsic with side effects, contrary to
>> many of your examples.
>>
>> Yes. The other solution is to make `inttoptr` a side effect depending
>> on the result type, but that sounds like a bad idea.
>
> I think you want side effects on the Int->GCRef intrinsic (hoisting it
> is illegal, sinking it below calls is illegal), and don’t want to use
> inttoptr at all.
Yes.
[snip]
-- Sanjoy
More information about the llvm-dev
mailing list