[LLVMdev] Address space extension

Thu Aug 8 02:04:25 PDT 2013

On 8 Aug 2013, at 04:23, Pete Cooper <peter_cooper at apple.com> wrote:

> 
> On Aug 7, 2013, at 7:23 PM, Michele Scandale <michele.scandale at gmail.com> wrote:
> 
>> On 08/08/2013 03:52 AM, Pete Cooper wrote:
>> 
>> From here I understand that in the IR there are addrspace(N) where N=0,1,2,3,... according to the target independent mapping done by the frontend to represent different address spaces (for OpenCL 1.2 0 = private, 1 = global, 2 = local, 3 = constant).
>> 
>> Then the frontend emits metadata that contains the map from "language address spaces" to "target address spaces" (for X86 would be 0->0 1->0 2->0 3->0).
>> 
>> Finally the instruction selection will use these informations to perform the instruction selection correctly and tagging the machine instruction with both logical and physical address spaces.
> Sounds good.

What happens when I link together two IR modules from different front ends that have different language-specific address spaces?

I would be very hesitant about using address spaces until we've fixed their semantics to disallow bitcasts between different address spaces and require an explicit address space cast.  To illustrate the problem, consider the following trivial example:

typedef __attribute__((address_space(256))) int* gsptr;

int *toglobal(gsptr foo)
{
	return (int*)foo;
}

int load(int *foo)
{
	return *foo;
}

int loadgs(gsptr foo)
{
	return *foo;
}

int loadgs2(gsptr foo)
{
	return *toglobal(foo);
}

When we compile this to LLVM IR with clang (disabling asynchronous unwind tables for clarity), at -O2 we get this:

define i32* @toglobal(i32 addrspace(256)* %foo) nounwind readnone ssp {
  %1 = bitcast i32 addrspace(256)* %foo to i32*
  ret i32* %1
}

define i32 @load(i32* nocapture %foo) nounwind readonly ssp {
  %1 = load i32* %foo, align 4, !tbaa !0
  ret i32 %1
}

define i32 @loadgs(i32 addrspace(256)* nocapture %foo) nounwind readonly ssp {
  %1 = load i32 addrspace(256)* %foo, align 4, !tbaa !0
  ret i32 %1
}

define i32 @loadgs2(i32 addrspace(256)* nocapture %foo) nounwind readonly ssp {
  %1 = bitcast i32 addrspace(256)* %foo to i32*
  %2 = load i32* %1, align 4, !tbaa !0
  ret i32 %2
}

Note that in loadgs2, the call to toglobal has been inlined and so the back end will just see a bitcast, which SelectionDAG treats as a no-op.  The assembly we get from this is:

_toglobal:                              ## @toglobal
## BB#0:
	pushq	%rbp
	movq	%rsp, %rbp
	movq	%rdi, %rax
	popq	%rbp
	ret
load:                                  ## @load
## BB#0:
	pushq	%rbp
	movq	%rsp, %rbp
	movl	(%rdi), %eax
	popq	%rbp
	ret

	.globl	_loadgs
	.align	4, 0x90
loadgs:                                ## @loadgs
## BB#0:
	pushq	%rbp
	movq	%rsp, %rbp
	movl	%gs:(%rdi), %eax
	popq	%rbp
	ret

	.globl	_loadgs2
	.align	4, 0x90
loadgs2:                               ## @loadgs2
## BB#0:
	pushq	%rbp
	movq	%rsp, %rbp
	movl	(%rdi), %eax
	popq	%rbp
	ret

loadgs() has been compiled correctly.  It uses the parameter as a gs-relative address and performs the load.  The assembly for load() and loadgs2(), however, are identical: both are treating the parameter as a linear (not gs-relative) address.  The cast has been lost.  This is even simpler when you look at toglobal(), which has just become a noop.  The correct code for this should be (I believe):

_toglobal:                              ## @toglobal
## BB#0:
	pushq	%rbp
	movq	%rsp, %rbp
	lea		%gs:(%rdi), %rax
	popq	%rbp
	ret

In the inlined version, the lea and movl should be combined into a single gs-relativel movl.  

Until we can generate correct code from IR containing address spaces, discussion of how to optimise this IR seems premature.

David