[LLVMdev] Address space extension

Thu Aug 8 03:06:29 PDT 2013

My view is modules with different data layouts should be considered incompatible. Data layouts are inherently target/language specific and I don't view it any different than combining IR modules compiled for different architectures.

Micah

> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> On Behalf Of David Chisnall
> Sent: Thursday, August 08, 2013 2:04 AM
> To: Pete Cooper
> Cc: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] Address space extension
> 
> On 8 Aug 2013, at 04:23, Pete Cooper <peter_cooper at apple.com> wrote:
> 
> >
> > On Aug 7, 2013, at 7:23 PM, Michele Scandale
> <michele.scandale at gmail.com> wrote:
> >
> >> On 08/08/2013 03:52 AM, Pete Cooper wrote:
> >>
> >> From here I understand that in the IR there are addrspace(N) where
> N=0,1,2,3,... according to the target independent mapping done by the
> frontend to represent different address spaces (for OpenCL 1.2 0 = private, 1
> = global, 2 = local, 3 = constant).
> >>
> >> Then the frontend emits metadata that contains the map from "language
> address spaces" to "target address spaces" (for X86 would be 0->0 1->0 2->0
> 3->0).
> >>
> >> Finally the instruction selection will use these informations to perform the
> instruction selection correctly and tagging the machine instruction with both
> logical and physical address spaces.
> > Sounds good.
> 
> What happens when I link together two IR modules from different front
> ends that have different language-specific address spaces?
> 
> I would be very hesitant about using address spaces until we've fixed their
> semantics to disallow bitcasts between different address spaces and require
> an explicit address space cast.  To illustrate the problem, consider the
> following trivial example:
> 
> typedef __attribute__((address_space(256))) int* gsptr;
> 
> int *toglobal(gsptr foo)
> {
> 	return (int*)foo;
> }
> 
> int load(int *foo)
> {
> 	return *foo;
> }
> 
> int loadgs(gsptr foo)
> {
> 	return *foo;
> }
> 
> int loadgs2(gsptr foo)
> {
> 	return *toglobal(foo);
> }
> 
> When we compile this to LLVM IR with clang (disabling asynchronous unwind
> tables for clarity), at -O2 we get this:
> 
> define i32* @toglobal(i32 addrspace(256)* %foo) nounwind readnone ssp {
>   %1 = bitcast i32 addrspace(256)* %foo to i32*
>   ret i32* %1
> }
> 
> define i32 @load(i32* nocapture %foo) nounwind readonly ssp {
>   %1 = load i32* %foo, align 4, !tbaa !0
>   ret i32 %1
> }
> 
> define i32 @loadgs(i32 addrspace(256)* nocapture %foo) nounwind readonly
> ssp {
>   %1 = load i32 addrspace(256)* %foo, align 4, !tbaa !0
>   ret i32 %1
> }
> 
> define i32 @loadgs2(i32 addrspace(256)* nocapture %foo) nounwind
> readonly ssp {
>   %1 = bitcast i32 addrspace(256)* %foo to i32*
>   %2 = load i32* %1, align 4, !tbaa !0
>   ret i32 %2
> }
> 
> Note that in loadgs2, the call to toglobal has been inlined and so the back end
> will just see a bitcast, which SelectionDAG treats as a no-op.  The assembly
> we get from this is:
> 
> _toglobal:                              ## @toglobal
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	movq	%rdi, %rax
> 	popq	%rbp
> 	ret
> load:                                  ## @load
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	movl	(%rdi), %eax
> 	popq	%rbp
> 	ret
> 
> 	.globl	_loadgs
> 	.align	4, 0x90
> loadgs:                                ## @loadgs
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	movl	%gs:(%rdi), %eax
> 	popq	%rbp
> 	ret
> 
> 	.globl	_loadgs2
> 	.align	4, 0x90
> loadgs2:                               ## @loadgs2
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	movl	(%rdi), %eax
> 	popq	%rbp
> 	ret
> 
> loadgs() has been compiled correctly.  It uses the parameter as a gs-relative
> address and performs the load.  The assembly for load() and loadgs2(),
> however, are identical: both are treating the parameter as a linear (not gs-
> relative) address.  The cast has been lost.  This is even simpler when you look
> at toglobal(), which has just become a noop.  The correct code for this should
> be (I believe):
> 
> _toglobal:                              ## @toglobal
> ## BB#0:
> 	pushq	%rbp
> 	movq	%rsp, %rbp
> 	lea		%gs:(%rdi), %rax
> 	popq	%rbp
> 	ret
> 
> In the inlined version, the lea and movl should be combined into a single gs-
> relativel movl.
> 
> Until we can generate correct code from IR containing address spaces,
> discussion of how to optimise this IR seems premature.
> 
> David
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev