[llvm-dev] RFC: alloca -- specify address space for allocation

Fri Aug 28 09:37:55 PDT 2015

On 08/27/2015 04:24 PM, Swaroop Sridhar wrote:
> Inline.
>
> From: Philip Reames [mailto:listmail at philipreames.com]
> Sent: Thursday, August 27, 2015 11:01 AM
> To: Swaroop Sridhar <Swaroop.Sridhar at microsoft.com>; llvm-dev <llvm-dev at lists.llvm.org>; Sanjoy Das <sanjoy at playingwithpointers.com>
> Cc: Joseph Tremoulet <jotrem at microsoft.com>; Andy Ayers <andya at microsoft.com>; Russell Hadley <rhadley at microsoft.com>
> Subject: Re: RFC: alloca -- specify address space for allocation
>
> *trimmed for length*
>> I'm not directly opposed to this proposal, but I'm not really in support of it either.
>> I think there a number of smaller engineering changes which could be made to RewriteStatepointsForGC to address this issue.
>> I am not convinced we need to allow addrspace on allocas to solve that problem.
>> More generally, I'm a bit bothered by how your asserting that a pointer to a stack based object is the same as a managed pointer into the heap.
>> They share some properties - the GC needs to know about them and mark through them - but they're also moderately different as well - stack based
>>   objects do not get relocated, heap ones do.  Given this differences, it's not even entirely clear to me that these two classes of pointers should be treated the same.
>> In particular, I don't see why RewriteStatepointsForGC needs to insert explicit relocations for stack based objects.
>> That makes no sense to me.
> Yes pointers to the stack are different in that they are not relocated or collected.
> However, CLR's "managed pointers" are defined as a super-type of pointers-to-the GC-heap and pointers-to-unmanaged-memory.
> These managed pointers should be reported to the GC. They will be updated during a collection relocated if the referenced object is relocated.
Rather than explaining what your expectations are, can you explain why?  
For example, I'm assuming here that you need to report stack based 
objects exclusively for determining liveness of objects they might 
contain pointers to (i.e. identifying roots).  I'm assuming that the 
actual stack allocation does not get any special magic handling by the 
GC other than marking through it.  Correct?
>
> Wrt the RewriteStatepointsForGC phase:
> If we know that a particular managed pointer is definitely coming from an alloca, it is OK to not report it, and not insert relocations.
> However, sometimes we cannot tell this unambiguously, if a managed pointer comes from a PHI(alloca pointer, heap pointer).
This is a sound point.  So the conservatively correct thing to do is to 
relocate all SSA pointers which are either heap objects or stack objects 
(e.g. all managed pointers in your terms).  An inference pass which 
proved that a particular SSA value was based on an alloca (address of a 
stack object) would not need to be relocated, but would need to be 
tracked for liveness purposes?  Right now, we really don't have that 
distinction (live vs needing relocation) within 
RewriteStatepointsForGC.  Do we need to add it?

(Also, I've been expecting to see patches from you fixing bugs in 
RSForGC around alloca handling.  Your using it in ways I never designed 
for, so I'm a bit surprised not to have seen these.  Have you audited 
the code and tested the cases you care about?  Having concrete test 
cases in tree would make a lot of this discussion easier.)
>
>> I think it would help if we took a step back, summarized the requirements, and approached this anew.
> The real requirement we have is: A way to construct a managed pointer to a stack location (or any other unmanaged location) such that it is interoperable with other GC pointers.
>
> The way we do it currently is using addrspacecast:
>    %loc0 = alloca i8
>    %1 = addrspacecast i8* %loc0 to i8 addrspace(1)*
>
> I'm wondering if:
> (a) There is a better way to do this to better suite managed-code analysis/transformation phases, and
> (b) If generating the managed-pointer by construction alloca addrspace(1)* is the way to do it.
I would lean towards one of two approaches:
1) Use an addrspace cast.
2) Add a custom out of tree intrinsic which consumes the alloca address 
and returns a managed pointer.  The lowering for this would be trivial, 
but it would give you a place to customize optimizer behavior if 
needed.  It might also make the semantics a bit more obvious in the IR.

The key bit here is that I think Chandler is right.  You are effectively 
casting a stack allocation *into* a managed pointer. Having something to 
mark that transition seems reasonable.

Of course, having said that all, I'm back to thinking that having a 
marker on the alloca would be somewhat reasonable too.  However, I think 
we need a much stronger justification to change the IR than has been 
provided.  If you can show that the cast based model doesn't work for 
some reason, we can re-evaluate.

Worth noting is that we might be better off introducing an orthogonal 
notion for tracking gc references entirely.  The addrspace mechanism has 
worked, but it is a little bit of a hack. We've talked about the need 
for an opaque pointer type.  Maybe when we actually get around to 
defining that, the alloca case is one we should consider.

Philip