[LLVMdev] Calling conventions for YMM registers on AVX

Jakob Stoklund Olesen stoklund at 2pi.dk
Tue Jan 10 08:15:33 PST 2012


On Jan 10, 2012, at 3:55 AM, Demikhovsky, Elena wrote:

> This is the wrong code:
> 
> declare <16 x float> @foo(<16 x float>)
> 
> define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind {
> entry:
>  %x1 = fadd  <16 x float>  %x, %y
>  %call = call  <16 x float> @foo(<16 x float> %x1) nounwind
>  %y1 = fsub  <16 x float>  %call, %y
>  ret <16 x float> %y1
> }

Thanks.

> ./llc -mattr=+avx -mtriple=x86_64-win32 < test.ll
> test:                                   # @test
> # BB#0:                                 # %entry
>        pushq   %rbp
>        movq    %rsp, %rbp
>        subq    $64, %rsp
>        vmovaps %xmm7, -32(%rbp)        # 16-byte Spill
>        vmovaps %xmm6, -16(%rbp)        # 16-byte Spill
>        vmovaps %ymm3, %ymm6
>        vmovaps %ymm2, %ymm7
>        vaddps  %ymm7, %ymm0, %ymm0
>        vaddps  %ymm6, %ymm1, %ymm1
>        callq   foo
>        vsubps  %ymm7, %ymm0, %ymm0
>        vsubps  %ymm6, %ymm1, %ymm1
>        vmovaps -16(%rbp), %xmm6        # 16-byte Reload
>        vmovaps -32(%rbp), %xmm7        # 16-byte Reload
>        addq    $64, %rsp
>        popq    %rbp
>        ret
> 
> ymm6,ymm7 are not saved across the call.

The xmm spills and reloads are correct, that is prolog and epilog code preserving xmm registers.

However, you are correct that ymm6 and ymm7 can't be used as callee-saved registers.

> We support Win64, that's right.
> We defined the upper part of YMM like this
> 
>  // XMM Registers, used by the various SSE instruction set extensions.
>  // Theses are actually only needed for implementing the Win64 CC with AVX.
>  def XMM0b: Register<"xmm0b">, DwarfRegNum<[17, 21, 21]>;
>  def XMM1b: Register<"xmm1b">, DwarfRegNum<[18, 22, 22]>;
>  def XMM2b: Register<"xmm2b">, DwarfRegNum<[19, 23, 23]>;
>  def XMM3b: Register<"xmm3b">, DwarfRegNum<[20, 24, 24]>;
>  def XMM4b: Register<"xmm4b">, DwarfRegNum<[21, 25, 25]>;
>  def XMM5b: Register<"xmm5b">, DwarfRegNum<[22, 26, 26]>;
>  def XMM6b: Register<"xmm6b">, DwarfRegNum<[23, 27, 27]>;
>  def XMM7b: Register<"xmm7b">, DwarfRegNum<[24, 28, 28]>;
> 
>  // X86-64 only
>  def XMM8b:  Register<"xmm8b">,  DwarfRegNum<[25, -2, -2]>;
>  def XMM9b:  Register<"xmm9b">,  DwarfRegNum<[26, -2, -2]>;
>  def XMM10b: Register<"xmm10b">, DwarfRegNum<[27, -2, -2]>;
>  def XMM11b: Register<"xmm11b">, DwarfRegNum<[28, -2, -2]>;
>  def XMM12b: Register<"xmm12b">, DwarfRegNum<[29, -2, -2]>;
>  def XMM13b: Register<"xmm13b">, DwarfRegNum<[30, -2, -2]>;
>  def XMM14b: Register<"xmm14b">, DwarfRegNum<[31, -2, -2]>;
>  def XMM15b: Register<"xmm15b">, DwarfRegNum<[32, -2, -2]>;

There is no need to define all these fake registers. One is enough:

def YMM_UPPER : Register<"ymmupper"> {
  let Aliases = [ YMM0, YMM1, ..., YMM15 ];
};

It doesn't need to be a sub-register either. Aliasing is good enough.

/jakob





More information about the llvm-dev mailing list