[PATCH] D31042: Allow DataLayout to specify addrspace for allocas.

Tue Mar 28 13:29:18 PDT 2017

rjmccall added a comment.

In https://reviews.llvm.org/D31042#712324, @arsenm wrote:

> In https://reviews.llvm.org/D31042#712124, @rjmccall wrote:
>
> > In https://reviews.llvm.org/D31042#711913, @arsenm wrote:
> >
> > > In https://reviews.llvm.org/D31042#711863, @rjmccall wrote:
> > >
> > > > I don't think the assumptions in Clang that the Clang AST address space 0 equals LLVM's address space 0 are nearly as pervasive as you think.  At any rate, since you require stack allocations to be in a different LLVM address space from the generic address space, but (per C) taking the address of a local variable yields a pointer in the generic address space, you're going to have to teach Clang about these differences between address spaces anyway, because it'll have to insert address-space conversions in a number of places.
> > >
> > >
> > > Yes, I expect there will be many addrspacecasts inserted from the alloca, which then the InferAddressSpaces pass would hopefully optimize away.
> >
> >
> > Okay.  So, to summarize:
> >
> > In amdgpu OpenCL, at source level, sizeof(void __private *) == sizeof(void {__global,__local,__constant} *) == 8.  However, at an implementation level, stack addresses are actually 32-bit, and you want that to be modeled correctly in the IR.  Therefore, allocas need to return a value in a 32-bit address space, which the frontend will immediately widen to a 64-bit address space in order to match the rules for __private.  In order to maintain performance, you have a pass that recognizes when an address value is the result of that kind of widening and re-narrows it to the 32-bit stack address space.
>
>
> This is mostly accurate except for the detail about the cast. For OpenCL 2.0 or C++ the alloca pointer will most likely need to be addrspacecasted to the OpenCL generic address space.

It looks to me like the language spec says that the address of a local variable is in the __private address space, but that a pointer in any address space can be promoted into the generic address space.  Clang's AST should call out that second promotion as a separate ImplicitCastExpr.  Of course, you may decide that you want to peephole it into a single addrspacecast instruction.

> Private pointers in memory may need to be zero extended to 64-bit in memory. For OpenCL 1.x there is no generic address space, and the old subtargets don't have the necessary hardware features to support this.

By "this", you mean an efficiently-accessible generic address space?  Presumably because the different memory-access operations work on specific address spaces and so, as you say, you would need creative codegen to figure out which case the pointer originally belonged to.  That makes sense.

>> Because LLVM currently makes some unfortunate assumptions about address space 0, this stack address space cannot be address space 0.  These assumptions include:
>>  (1) As far as debug info is concerned, the size of a pointer is the size of a pointer in address space 0.  (Is this assumption why sizeof(void __private *) == 8 despite __private being clearly intended in the OpenCL spec to be the address space of stack pointers?)
> 
> This is a point that the spec is vague on. While the spec explicitly allows different size pointers for different address spaces, there isn't really any detailed description of what this entails.

I don't see what the problem is.  The spec doesn't allow pointers to be converted between address spaces at all (except for the promotion into the generic address space in OpenCL 2.0).  Pointers into different address spaces are just completely different types.

I mean, in general, the OpenCL spec is rather hilariously poorly-drafted, and as a compiler programmer you have to read between the lines to figure out how things are supposed to be generalized.  The old 1.1 spec used to say that "variables declared as pointers are assumed to point to the __private address space if an address space qualifier is not specified", and everyone just assumes that that's meant to be a general rule for pointer types, not something that only applies to variables and only at the outermost level of pointer.  Similarly, you have to read the paragraph about assignability as a general conversion rule, and you have to guess that probably when the 2.0 specification says that "[c]asting a pointer to address space A to a pointer to address space B is illegal if A and B are named address spaces and A is not the same as B", that it's okay to both cast into *and out of* the generic address space, and you have to guess at what the semantics of the latter are supposed to be.  The best guidance I can offer is to instead read the Embedded C specification (WG14 N1169 might be the most version), which goes into address spaces somewhat more rigorously; it's what I've always used as a baseline for understanding what OpenCL is trying to do.  From this point of view, the four OpenCL 1.0 address spaces are disjoint, and OpenCL 2.0's generic address space is simply one that is a superset of all the others.  Embedded C allows casts between address spaces, but the cast has undefined behavior if the pointer doesn't actually point into the destination address space.  Since the four named address spaces are formally disjoint, casts directly between them are always invalid, and so OpenCL's rule that they're just forbidden makes sense.  Anyway, that's what I would suggest as an approach for understanding OpenCL.

> This wasn't addressed at all by the SPIR spec, which just defined a single 64-bit datalayout string. sizeof really indicates the in-memory representation's size, so by using a DataLayout with 32-bit pointers with an ABI alignment of 64-bit, we get the necessary codegen properties of a 32-bit pointer while maintaining the required ABI and single sizeof().

Well, SPIR is an abstract representation.  If you were emitting SPIR, you wouldn't want any of these assumptions about stack addressing to leak into it.  If you're lowering from SPIR, of course, you just have to invent an ABI rule for how your 32-bit pointers are passed around as 64-bit quantities, and then you globally rewrite the IR to implement that.

>> (2) Functions are currently always defined in address space 0.  A function pointer in your implementation must be in a 64-bit address space.  (My understanding was that OpenCL generally forbade function pointers, so I'm not sure if this is actually important?)
> 
> It is true that OpenCL doesn't allow function pointers at the source level. I'm working on implementing calls without inlining everything now, and to be able to correctly lower LLVM IR for them requires the DAG picking the right MVT when emitting the global address so it still comes up there. Function pointer support is also helpful for implementing other languages being worked on like C++ and python. I believe recent version of CUDA support them as well.

Well, if you don't have real function pointers, and therefore the only use of functions is in direct calls, it feels like you can always hack around pointer-size problems in your backend pretty easily.  If the source language has real function pointers, it also has to decide what address space they're in — it could be a dedicated address space for functions, which would be legal under C rules, but which would take some effort to support in Clang.

>> (3) Some optimizations assume that 0 is an invalid address in address space 0, but in fact it's a valid stack address.
>> 
>> It's entirely possible that we should, indeed, change all of these assumptions.  For the time being, you feel that's not a reasonable fight, and so you would like to allow IR to use an explicitly non-0 address space for the stack.
> 
> Yes. Some day I would like to be able to apply some of these assumptions to other non-0 address spaces as well (i.e. 0 is null in some address spaces, but not others) but I think that is a larger task.

Yes, that seems like a good goal.  In fact, it would be nice if the assumptions about address space 0 were just a default behavior that could be overridden by the data layout.

https://reviews.llvm.org/D31042