[llvm-dev] RFC: alloca -- specify address space for allocation

Fri Aug 28 21:30:15 PDT 2015

> -----Original Message-----
> From: Philip Reames [mailto:listmail at philipreames.com]
> Sent: Friday, August 28, 2015 9:38 AM
> To: Swaroop Sridhar <Swaroop.Sridhar at microsoft.com>; llvm-dev <llvm-
> dev at lists.llvm.org>; Sanjoy Das <sanjoy at playingwithpointers.com>
> Cc: Joseph Tremoulet <jotrem at microsoft.com>; Andy Ayers
> <andya at microsoft.com>; Russell Hadley <rhadley at microsoft.com>
> Subject: Re: RFC: alloca -- specify address space for allocation
> 

>> I think for the use case you are outlining, an addrspacecast is the correct IR model -- 
>> you're specifically saying that it is OK in this case to turn a pointer from addrspace 0 
>> into one for addrspace N because N is your "managed pointer" set that can be *either* 
>> a GC-pointer or a non-GC-pointer. 

>> What the FE is saying is that this is an *acceptable* transition of addrspace, because your 
>> language and runtime semantics have provided for it. 
>> I think the proper way to say that is with a cast.

> The key bit here is that I think Chandler is right.  You are effectively casting a
> stack allocation *into* a managed pointer. Having something to mark that
> transition seems reasonable.

I think there are two views here:

(1) MSIL level view:
In CLR, the stack, is a part of "managed memory" (which is not the same as gc-heap, which is managed and garbage-collected memory).
Therefore, all *references* to stack locations are "managed addresses,"  in the sense that the compiler/runtime exercises certain control over (values that are) managed-address:
For example: it enforces certain restrictions to guarantee safety -- ex: lifetime restrictions, non-null requirement in certain contexts, etc.

This is different from a notion of "unmanaged memory" which is for interoperability with native code.
*Pointers* to unmanaged memory are not controlled by the runtime (ex: do not provide any safety guarantees).

So, from the language semantics point of view, stack addresses are created as managed pointers.
Which is why the proposal is to have alloca directly in the managed address-space seemed natural.

Joseph has written more details in the document that Philip shared out in this thread.

 (2) A more Lower level IR view:
LLVM creates all stack locations in addrespace(0) for all code, whether it comes from managed-code or native code. 
Of these, Stack locations corresponding to the managed-stack are promoted to managed-addresses via addrspacecast.
As an optimization, the FrontEnd inserts the addrspace casts only for those stack locations that are actually address-taken.

If I understand correctly, the recommendation (by Philip, Chandler and David) is approach (2) because:
(a) No change to Instruction-Set is necessary when the semantics is achievable via existing instructions.
(b) It saves changing the optimizer to allocate in the correct address-space. 
Looks like the problem here is that: the optimizer is expected to create type-preserving transformation 
by allocating in the correct address-space, but blindly allocates in the default address space today.
I don't know the LLVM optimizer well enough to have a good estimate of the magnitude of changes 
necessary here. But, I agree that (avoiding) substantial changes to the optimizer is a strong consideration.

>> You might need N to be a distinct address space from 
>> the one used for GC-pointers and to have similar casts emitted by the frontend.

Yes, eventually we'll need to differentiate between:
(i) Pointers to unmanaged memory -- which will never be reported to the runtime
(ii) Pointers to GC-heap objects -- which will always be reported to the runtime
(iii) Generic managed pointer -- which may need to be reported if we cannot establish that it points outside the GC heap.

Currently we report all pointers to the runtime as managed pointers.
This is inefficient because the GC then needs to do extra work to figure out what kind of pointer it is:
Pointer to a heap object, pointer within a heap object, or outside the heap.

> Of course, having said that all, I'm back to thinking that having a marker on
> the alloca would be somewhat reasonable too.  However, I think we need a
> much stronger justification to change the IR than has been provided.  If you
> can show that the cast based model doesn't work for some reason, we can
> re-evaluate.

I don't think we can say that the cast-based model will not work. 
The question is whether alloca addrspace(1)* is a better fit for MSIL semantics, analysis phases, 
and managed-code specific optimizations.

I'm OK if we conclude that we'll keep using the cast model until we hit a concrete case 
where it does not work, or seems architecturally misfit.

> Worth noting is that we might be better off introducing an orthogonal notion
> for tracking gc references entirely.  The addrspace mechanism has worked,
> but it is a little bit of a hack. We've talked about the need for an opaque
> pointer type.  Maybe when we actually get around to defining that, the alloca
> case is one we should consider.

Yes, I'm mainly concerned about getting the right types on the different kinds of
pointers. If adders space annotation implies more constraints (ex: on layout) than 
what's already necessitated by the type distinction, we should use a separate 
mechanism.

Again, I'm OK if we want to keep using addrspacecast until we hit a concrete case 
where it breaks down.

Swaroop.