[LLVMdev] Calling conventions for YMM registers on AVX

Demikhovsky, Elena elena.demikhovsky at intel.com
Tue Jan 10 03:55:46 PST 2012


This is the wrong code:

declare <16 x float> @foo(<16 x float>)

define <16 x float> @test(<16 x float> %x, <16 x float> %y) nounwind {
entry:
  %x1 = fadd  <16 x float>  %x, %y
  %call = call  <16 x float> @foo(<16 x float> %x1) nounwind
  %y1 = fsub  <16 x float>  %call, %y
  ret <16 x float> %y1
}
./llc -mattr=+avx -mtriple=x86_64-win32 < test.ll
        .def     test;
        .scl    2;
        .type   32;
        .endef
        .text
        .globl  test
        .align  16, 0x90
test:                                   # @test
# BB#0:                                 # %entry
        pushq   %rbp
        movq    %rsp, %rbp
        subq    $64, %rsp
        vmovaps %xmm7, -32(%rbp)        # 16-byte Spill
        vmovaps %xmm6, -16(%rbp)        # 16-byte Spill
        vmovaps %ymm3, %ymm6
        vmovaps %ymm2, %ymm7
        vaddps  %ymm7, %ymm0, %ymm0
        vaddps  %ymm6, %ymm1, %ymm1
        callq   foo
        vsubps  %ymm7, %ymm0, %ymm0
        vsubps  %ymm6, %ymm1, %ymm1
        vmovaps -16(%rbp), %xmm6        # 16-byte Reload
        vmovaps -32(%rbp), %xmm7        # 16-byte Reload
        addq    $64, %rsp
        popq    %rbp
        ret

ymm6,ymm7 are not saved across the call.

I have a fix, can send it to review.

- Elena

-----Original Message-----
From: Demikhovsky, Elena 
Sent: Tuesday, January 10, 2012 09:15
To: 'Jakob Stoklund Olesen'
Cc: Bruno Cardoso Lopes; llvmdev at cs.uiuc.edu
Subject: RE: [LLVMdev] Calling conventions for YMM registers on AVX

We support Win64, that's right.
We defined the upper part of YMM like this

  // XMM Registers, used by the various SSE instruction set extensions.
  // Theses are actually only needed for implementing the Win64 CC with AVX.
  def XMM0b: Register<"xmm0b">, DwarfRegNum<[17, 21, 21]>;
  def XMM1b: Register<"xmm1b">, DwarfRegNum<[18, 22, 22]>;
  def XMM2b: Register<"xmm2b">, DwarfRegNum<[19, 23, 23]>;
  def XMM3b: Register<"xmm3b">, DwarfRegNum<[20, 24, 24]>;
  def XMM4b: Register<"xmm4b">, DwarfRegNum<[21, 25, 25]>;
  def XMM5b: Register<"xmm5b">, DwarfRegNum<[22, 26, 26]>;
  def XMM6b: Register<"xmm6b">, DwarfRegNum<[23, 27, 27]>;
  def XMM7b: Register<"xmm7b">, DwarfRegNum<[24, 28, 28]>;

  // X86-64 only
  def XMM8b:  Register<"xmm8b">,  DwarfRegNum<[25, -2, -2]>;
  def XMM9b:  Register<"xmm9b">,  DwarfRegNum<[26, -2, -2]>;
  def XMM10b: Register<"xmm10b">, DwarfRegNum<[27, -2, -2]>;
  def XMM11b: Register<"xmm11b">, DwarfRegNum<[28, -2, -2]>;
  def XMM12b: Register<"xmm12b">, DwarfRegNum<[29, -2, -2]>;
  def XMM13b: Register<"xmm13b">, DwarfRegNum<[30, -2, -2]>;
  def XMM14b: Register<"xmm14b">, DwarfRegNum<[31, -2, -2]>;
  def XMM15b: Register<"xmm15b">, DwarfRegNum<[32, -2, -2]>;

  // YMM Registers, used by AVX instructions
  let SubRegIndices = [sub_xmm, sub_xmmb] in {
  def YMM0: RegisterWithSubRegs<"ymm0", [XMM0, XMM0b]>, DwarfRegNum<[17, 21, 21]>;
  def YMM1: RegisterWithSubRegs<"ymm1", [XMM1, XMM1b]>, DwarfRegNum<[18, 22, 22]>;
  def YMM2: RegisterWithSubRegs<"ymm2", [XMM2, XMM2b]>, DwarfRegNum<[19, 23, 23]>;
  def YMM3: RegisterWithSubRegs<"ymm3", [XMM3, XMM3b]>, DwarfRegNum<[20, 24, 24]>;
  def YMM4: RegisterWithSubRegs<"ymm4", [XMM4, XMM4b]>, DwarfRegNum<[21, 25, 25]>;
  def YMM5: RegisterWithSubRegs<"ymm5", [XMM5, XMM5b]>, DwarfRegNum<[22, 26, 26]>;
  def YMM6: RegisterWithSubRegs<"ymm6", [XMM6, XMM6b]>, DwarfRegNum<[23, 27, 27]>;
  def YMM7: RegisterWithSubRegs<"ymm7", [XMM7, XMM7b]>, DwarfRegNum<[24, 28, 28]>;
  def YMM8:  RegisterWithSubRegs<"ymm8", [XMM8, XMM8b]>,  DwarfRegNum<[25, -2, -2]>;
  def YMM9:  RegisterWithSubRegs<"ymm9", [XMM9, XMM9b]>,  DwarfRegNum<[26, -2, -2]>;
  def YMM10: RegisterWithSubRegs<"ymm10", [XMM10, XMM10b]>, DwarfRegNum<[27, -2, -2]>;
  def YMM11: RegisterWithSubRegs<"ymm11", [XMM11, XMM11b]>, DwarfRegNum<[28, -2, -2]>;
  def YMM12: RegisterWithSubRegs<"ymm12", [XMM12, XMM12b]>, DwarfRegNum<[29, -2, -2]>;
  def YMM13: RegisterWithSubRegs<"ymm13", [XMM13, XMM13b]>, DwarfRegNum<[30, -2, -2]>;
  def YMM14: RegisterWithSubRegs<"ymm14", [XMM14, XMM14b]>, DwarfRegNum<[31, -2, -2]>;
  def YMM15: RegisterWithSubRegs<"ymm15", [XMM15, XMM15b]>, DwarfRegNum<[32, -2, -2]>;
  }

- Elena

-----Original Message-----
From: Jakob Stoklund Olesen [mailto:jolesen at apple.com] 
Sent: Tuesday, January 10, 2012 01:14
To: Demikhovsky, Elena
Cc: Bruno Cardoso Lopes; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] Calling conventions for YMM registers on AVX


On Jan 9, 2012, at 10:00 AM, Jakob Stoklund Olesen wrote:

> 
> On Jan 8, 2012, at 11:18 PM, Demikhovsky, Elena wrote:
> 
>> I'll explain what we see in the code.
>> 1. The caller saves XMM registers across the call if needed (according to DEFS definition).
>> YMMs are not in the set, so caller does not take care.
> 
> This is not how the register allocator works. It saves the registers holding values, it doesn't care which alias is clobbered.
> 
> Are you saying that only the xmm part of a ymm register gets spilled before a call?
> 
>> 2. The callee preserves XMMs but works with YMMs and clobbering them.
>> 3. So after the call, the upper part of YMM is gone.
> 
> Are you on Windows? As Bruno said, all xmm and ymm registers are call-clobbered on non-Windows platforms.

This thread has lots of interesting information: http://software.intel.com/en-us/forums/showthread.php?t=59291

I wasn't able to find a formal Win64 ABI spec, but according to http://www.agner.org/optimize/calling_conventions.pdf, xmm6-xmm15 are callee-saved on win64, but the high bits in ymm6-ymm15 are not.

That's not currently correctly modelled in LLVM. To fix it, create a pseudo-register YMMHI_CLOBBER that aliases ymm6-ymm15. Then add YMMHI_CLOBBER to the registers clobbered by WINCALL64*.

/jakob

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.





More information about the llvm-dev mailing list