[LLVMdev] Vector promotions for calling conventions

Sun Jul 4 23:02:14 PDT 2010

The X86-64 calling convention (annoyingly) specifies that "struct x { float a,b,c,d; }" is passed or returned in the low 2 elements of two separate XMM registers.   For example, returning that would return "a,b" in the low elements of XMM0 and "c,d" in the low elements of XMM1.  Both llvm-gcc and clang currently generate atrocious IR for these structs, which you can see if you compile this:

struct x { float a,b,c,d; };
struct x foo(struct x *P) { return *P; };

The machine code generated by llvm-gcc[*] for this is:

_foo:
	movl	(%rdi), %eax
	movl	4(%rdi), %ecx
	shlq	$32, %rcx
	addq	%rax, %rcx
	movd	%rcx, %xmm0
	movl	8(%rdi), %eax
	movl	12(%rdi), %ecx
	shlq	$32, %rcx
	addq	%rax, %rcx
	movd	%rcx, %xmm1
	ret

when we really just want:

_foo:
	movq	(%rdi), %xmm0
	movq	8(%rdi), %xmm1
	ret

I'm looking at having clang generate IR for this by passing and returning the two halfs as v2f32 values, which they are, and doing insert/extracts in the caller/callee.  However, at the moment, the x86 backend is passing each element of the v2f32 as an f32, instead of promoting the type and passing the v2f32 as the low two elements of the v4f32.  In the example above, this means it returns each element in XMM0,XMM1,XMM2,XMM3 instead of just XMM0/1.

We already do this sort of vector promotion for operators in type legalization.  Is there any reason not to do it for the calling convention case?  Is there anyone interested in working on this? :)

-Chris

[*] Clang happens to generate good machine code for this case, but the IR is still awful and it falls down hard on other similar cases.