[llvm-dev] RFC: alloca -- specify address space for allocation

Fri Aug 28 10:28:47 PDT 2015

I'm in the process of throwing together a design doc to summarize what I 
know about the stack based object case we need to support for CLR.  You 
can find the in progress draft here:
https://docs.google.com/document/d/1H5am1PyY8n8hc1hIAisDQMgbZDmMgnjU8wdN25YqMbQ/edit?usp=sharing

I'm finding I'm having a hard time tracking the details through all of 
the threads and conversations and figured it would be a good idea to get 
everything centralized into one place.

Philip

On 08/28/2015 09:37 AM, Philip Reames wrote:
> On 08/27/2015 04:24 PM, Swaroop Sridhar wrote:
>> Inline.
>>
>> From: Philip Reames [mailto:listmail at philipreames.com]
>> Sent: Thursday, August 27, 2015 11:01 AM
>> To: Swaroop Sridhar <Swaroop.Sridhar at microsoft.com>; llvm-dev 
>> <llvm-dev at lists.llvm.org>; Sanjoy Das <sanjoy at playingwithpointers.com>
>> Cc: Joseph Tremoulet <jotrem at microsoft.com>; Andy Ayers 
>> <andya at microsoft.com>; Russell Hadley <rhadley at microsoft.com>
>> Subject: Re: RFC: alloca -- specify address space for allocation
>>
>> *trimmed for length*
>>> I'm not directly opposed to this proposal, but I'm not really in 
>>> support of it either.
>>> I think there a number of smaller engineering changes which could be 
>>> made to RewriteStatepointsForGC to address this issue.
>>> I am not convinced we need to allow addrspace on allocas to solve 
>>> that problem.
>>> More generally, I'm a bit bothered by how your asserting that a 
>>> pointer to a stack based object is the same as a managed pointer 
>>> into the heap.
>>> They share some properties - the GC needs to know about them and 
>>> mark through them - but they're also moderately different as well - 
>>> stack based
>>>   objects do not get relocated, heap ones do.  Given this 
>>> differences, it's not even entirely clear to me that these two 
>>> classes of pointers should be treated the same.
>>> In particular, I don't see why RewriteStatepointsForGC needs to 
>>> insert explicit relocations for stack based objects.
>>> That makes no sense to me.
>> Yes pointers to the stack are different in that they are not 
>> relocated or collected.
>> However, CLR's "managed pointers" are defined as a super-type of 
>> pointers-to-the GC-heap and pointers-to-unmanaged-memory.
>> These managed pointers should be reported to the GC. They will be 
>> updated during a collection relocated if the referenced object is 
>> relocated.
> Rather than explaining what your expectations are, can you explain 
> why?  For example, I'm assuming here that you need to report stack 
> based objects exclusively for determining liveness of objects they 
> might contain pointers to (i.e. identifying roots).  I'm assuming that 
> the actual stack allocation does not get any special magic handling by 
> the GC other than marking through it.  Correct?
>>
>> Wrt the RewriteStatepointsForGC phase:
>> If we know that a particular managed pointer is definitely coming 
>> from an alloca, it is OK to not report it, and not insert relocations.
>> However, sometimes we cannot tell this unambiguously, if a managed 
>> pointer comes from a PHI(alloca pointer, heap pointer).
> This is a sound point.  So the conservatively correct thing to do is 
> to relocate all SSA pointers which are either heap objects or stack 
> objects (e.g. all managed pointers in your terms).  An inference pass 
> which proved that a particular SSA value was based on an alloca 
> (address of a stack object) would not need to be relocated, but would 
> need to be tracked for liveness purposes? Right now, we really don't 
> have that distinction (live vs needing relocation) within 
> RewriteStatepointsForGC.  Do we need to add it?
>
> (Also, I've been expecting to see patches from you fixing bugs in 
> RSForGC around alloca handling.  Your using it in ways I never 
> designed for, so I'm a bit surprised not to have seen these.  Have you 
> audited the code and tested the cases you care about?  Having concrete 
> test cases in tree would make a lot of this discussion easier.)
>>
>>> I think it would help if we took a step back, summarized the 
>>> requirements, and approached this anew.
>> The real requirement we have is: A way to construct a managed pointer 
>> to a stack location (or any other unmanaged location) such that it is 
>> interoperable with other GC pointers.
>>
>> The way we do it currently is using addrspacecast:
>>    %loc0 = alloca i8
>>    %1 = addrspacecast i8* %loc0 to i8 addrspace(1)*
>>
>> I'm wondering if:
>> (a) There is a better way to do this to better suite managed-code 
>> analysis/transformation phases, and
>> (b) If generating the managed-pointer by construction alloca 
>> addrspace(1)* is the way to do it.
> I would lean towards one of two approaches:
> 1) Use an addrspace cast.
> 2) Add a custom out of tree intrinsic which consumes the alloca 
> address and returns a managed pointer.  The lowering for this would be 
> trivial, but it would give you a place to customize optimizer behavior 
> if needed.  It might also make the semantics a bit more obvious in the 
> IR.
>
> The key bit here is that I think Chandler is right.  You are 
> effectively casting a stack allocation *into* a managed pointer. 
> Having something to mark that transition seems reasonable.
>
> Of course, having said that all, I'm back to thinking that having a 
> marker on the alloca would be somewhat reasonable too.  However, I 
> think we need a much stronger justification to change the IR than has 
> been provided.  If you can show that the cast based model doesn't work 
> for some reason, we can re-evaluate.
>
> Worth noting is that we might be better off introducing an orthogonal 
> notion for tracking gc references entirely.  The addrspace mechanism 
> has worked, but it is a little bit of a hack. We've talked about the 
> need for an opaque pointer type.  Maybe when we actually get around to 
> defining that, the alloca case is one we should consider.
>
> Philip