[llvm] r175043 - Prevent insertion of

Sat Aug 31 22:45:58 PDT 2013

Hi Gao,

The Intel_ocl_bi is a special calling conventions for the math library used by Intel OpenCL (it is not generic).
According to these conventions, YMM registers are preserved by callee on X64 and Win64 (the set of preserved YMMs is different on these platforms). On X32/Win32 the YMMs are not preserved at all. There are only 8 SIMD registers on 32-bit platform. We worked a lot on performance tuning and found these conventions the most optimal for us.

-  Elena

-----Original Message-----
From: Gao, Yunzhong [mailto:yunzhong_gao at playstation.sony.com] 
Sent: Thursday, August 29, 2013 22:22
To: Demikhovsky, Elena
Cc: llvm-commits at cs.uiuc.edu
Subject: RE: [llvm] r175043 - Prevent insertion of 

ping.

________________________________________
From: llvm-commits-bounces at cs.uiuc.edu [llvm-commits-bounces at cs.uiuc.edu] on behalf of Gao, Yunzhong
Sent: Friday, August 23, 2013 1:48 PM
To: llvm-commits at cs.uiuc.edu
Subject: Re: [llvm] r175043 - Prevent insertion of

Elena Demikhovsky <elena.demikhovsky at ...> writes:

>
> Author: delena
> Date: Wed Feb 13 02:02:04 2013
> New Revision: 175043
>
> URL: http://llvm.org/viewvc/llvm-project?rev=175043&view=rev
> Log:
> Prevent insertion of "vzeroupper" before call that preserves YMM
registers, since a caller uses
> preserved registers across the call.
>
> Modified:
>     llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp
>     llvm/trunk/test/CodeGen/X86/avx-intel-ocl.ll
>
> Modified: llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp
> URL:
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp?rev=175043&r1=175042&r2=175043&view=diff
> ======================================================================
> ========
> --- llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp Wed Feb 13 02:02:04 
> +++ 2013
>  <at>  <at>  -120,9 +120,19  <at>  <at>  static bool
checkFnHasLiveInYmm(MachineR
>    return false;
>  }
>
> +static bool clobbersAllYmmRegs(const MachineOperand &MO) {
> +  for (unsigned reg = X86::YMM0; reg < X86::YMM15; ++reg) {
> +    if (!MO.clobbersPhysReg(reg))
> +      return false;
> +  }
> +  return true;
> +}
> +
>  static bool hasYmmReg(MachineInstr *MI) {
>    for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
>      const MachineOperand &MO = MI->getOperand(i);
> +    if (MI->isCall() && MO.isRegMask() && !clobbersAllYmmRegs(MO))
> +      return true;
>      if (!MO.isReg())
>        continue;
>      if (MO.isDebug())
>
> Modified: llvm/trunk/test/CodeGen/X86/avx-intel-ocl.ll
> URL:
http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-intel-ocl.ll?rev=175043&r1=175042&r2=175043&view=diff
> ======================================================================
> ========
> --- llvm/trunk/test/CodeGen/X86/avx-intel-ocl.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/avx-intel-ocl.ll Wed Feb 13 02:02:04 
> +++ 2013
>  <at>  <at>  -127,3 +127,43  <at>  <at>  define i32  <at> test_int(i32 
> %a,
i32 %b) nou
>      %c = add i32 %c2, %b
>       ret i32 %c
>  }
> +
> +; WIN64: test_float4
> +; WIN64-NOT: vzeroupper
> +; WIN64: call
> +; WIN64-NOT: vzeroupper
> +; WIN64: call
> +; WIN64: ret
> +
> +; X64: test_float4
> +; X64-NOT: vzeroupper
> +; X64: call
> +; X64-NOT: vzeroupper
> +; X64: call
> +; X64: ret
> +
> +; X32: test_float4
> +; X32: vzeroupper
> +; X32: call
> +; X32: vzeroupper
> +; X32: call
> +; X32: ret
> +
> +declare <4 x float>  <at> func_float4(<4 x float>, <4 x float>, <4 x 
> +float>)
> +
> +define <8 x float>  <at> test_float4(<8 x float> %a, <8 x float> %b, 
> +<8 x
float> %c) nounwind readnone {
> +entry:
> +  %0 = shufflevector <8 x float> %a, <8 x float> undef, <4 x i32> 
> +<i32 0,
i32 1, i32 2, i32 3>
> +  %1 = shufflevector <8 x float> %b, <8 x float> undef, <4 x i32> 
> + <i32 0,
i32 1, i32 2, i32 3>
> +  %2 = shufflevector <8 x float> %c, <8 x float> undef, <4 x i32> 
> + <i32 0,
i32 1, i32 2, i32 3>
> +  %call.i = tail call intel_ocl_bicc <4 x float>  <at> func_float4(<4 
> + x
float> %0, <4 x float> %1, <4 x float> %2) nounwind
> +  %3 = shufflevector <4 x float> %call.i, <4 x float> undef, <8 x 
> + i32>
<i32 0, i32 1, i32 2, i32 3, i32 undef, i32
> undef, i32 undef, i32 undef>
> +  %4 = shufflevector <8 x float> %a, <8 x float> undef, <4 x i32> 
> + <i32 4,
i32 5, i32 6, i32 7>
> +  %5 = shufflevector <8 x float> %b, <8 x float> undef, <4 x i32> 
> + <i32 4,
i32 5, i32 6, i32 7>
> +  %6 = shufflevector <8 x float> %c, <8 x float> undef, <4 x i32> 
> + <i32 4,
i32 5, i32 6, i32 7>
> +  %call.i2 = tail call intel_ocl_bicc <4 x float>  <at> 
> + func_float4(<4 x
float> %4, <4 x float> %5, <4 x float> %6) nounwind
> +  %7 = shufflevector <4 x float> %call.i2, <4 x float> undef, <8 x 
> + i32>
<i32 0, i32 1, i32 2, i32 3, i32 undef, i32
> undef, i32 undef, i32 undef>
> +  %8 = shufflevector <8 x float> %3, <8 x float> %7, <8 x i32> <i32 
> + 0,
i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
> +  ret <8 x float> %8
> +}
> +
>

Hi Elena,
I would like to discuss with you about your commit r175043.

In X86VZeroUpper.cpp, how does clobbersAllYmmRegs() check that YMM registers are preserved? It seems that this function is only checking that at least one YMM register is not written/clobbered.

In avx-intel-ocl.ll, why does X32 have different check sequence than X64 and WIN64? It seems that test_float4() is trying to create some undef values in the upper 128 bits of a YMM register with the following IR instructions:

+  %0 = shufflevector <8 x float> %a, <8 x float> undef, <4 x i32> <i32 
+ 0,
i32 1, i32 2, i32 3>
+  %1 = shufflevector <8 x float> %b, <8 x float> undef, <4 x i32> <i32 
+ 0,
i32 1, i32 2, i32 3>
+  %2 = shufflevector <8 x float> %c, <8 x float> undef, <4 x i32> <i32 
+ 0,
i32 1, i32 2, i32 3>

But the X86 backend is generating AVX vmovups instructions for both X32 and X64 cases, which will clear the upper 128 bits instead of leaving undef values there. So it seems that vzeroupper is not needed in either case, right?

Thanks,
- Gao.

_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.