[llvm] r175043 - Prevent insertion of

Fri Aug 23 13:48:22 PDT 2013

Elena Demikhovsky <elena.demikhovsky at ...> writes:

> 
> Author: delena
> Date: Wed Feb 13 02:02:04 2013
> New Revision: 175043
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=175043&view=rev
> Log:
> Prevent insertion of "vzeroupper" before call that preserves YMM
registers, since a caller uses
> preserved registers across the call.
> 
> Modified:
>     llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp
>     llvm/trunk/test/CodeGen/X86/avx-intel-ocl.ll
> 
> Modified: llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp
> URL:
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp?rev=175043&r1=175042&r2=175043&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86VZeroUpper.cpp Wed Feb 13 02:02:04 2013
>  <at>  <at>  -120,9 +120,19  <at>  <at>  static bool
checkFnHasLiveInYmm(MachineR
>    return false;
>  }
> 
> +static bool clobbersAllYmmRegs(const MachineOperand &MO) {
> +  for (unsigned reg = X86::YMM0; reg < X86::YMM15; ++reg) {
> +    if (!MO.clobbersPhysReg(reg))
> +      return false;
> +  }
> +  return true;
> +}
> +
>  static bool hasYmmReg(MachineInstr *MI) {
>    for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
>      const MachineOperand &MO = MI->getOperand(i);
> +    if (MI->isCall() && MO.isRegMask() && !clobbersAllYmmRegs(MO))
> +      return true;
>      if (!MO.isReg())
>        continue;
>      if (MO.isDebug())
> 
> Modified: llvm/trunk/test/CodeGen/X86/avx-intel-ocl.ll
> URL:
http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-intel-ocl.ll?rev=175043&r1=175042&r2=175043&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/avx-intel-ocl.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/avx-intel-ocl.ll Wed Feb 13 02:02:04 2013
>  <at>  <at>  -127,3 +127,43  <at>  <at>  define i32  <at> test_int(i32 %a,
i32 %b) nou
>      %c = add i32 %c2, %b
>  	ret i32 %c
>  }
> +
> +; WIN64: test_float4
> +; WIN64-NOT: vzeroupper
> +; WIN64: call
> +; WIN64-NOT: vzeroupper
> +; WIN64: call
> +; WIN64: ret
> +
> +; X64: test_float4
> +; X64-NOT: vzeroupper
> +; X64: call
> +; X64-NOT: vzeroupper
> +; X64: call
> +; X64: ret
> +
> +; X32: test_float4
> +; X32: vzeroupper
> +; X32: call
> +; X32: vzeroupper
> +; X32: call
> +; X32: ret
> +
> +declare <4 x float>  <at> func_float4(<4 x float>, <4 x float>, <4 x float>)
> +
> +define <8 x float>  <at> test_float4(<8 x float> %a, <8 x float> %b, <8 x
float> %c) nounwind readnone {
> +entry:
> +  %0 = shufflevector <8 x float> %a, <8 x float> undef, <4 x i32> <i32 0,
i32 1, i32 2, i32 3>
> +  %1 = shufflevector <8 x float> %b, <8 x float> undef, <4 x i32> <i32 0,
i32 1, i32 2, i32 3>
> +  %2 = shufflevector <8 x float> %c, <8 x float> undef, <4 x i32> <i32 0,
i32 1, i32 2, i32 3>
> +  %call.i = tail call intel_ocl_bicc <4 x float>  <at> func_float4(<4 x
float> %0, <4 x float> %1, <4 x float> %2) nounwind
> +  %3 = shufflevector <4 x float> %call.i, <4 x float> undef, <8 x i32>
<i32 0, i32 1, i32 2, i32 3, i32 undef, i32
> undef, i32 undef, i32 undef>
> +  %4 = shufflevector <8 x float> %a, <8 x float> undef, <4 x i32> <i32 4,
i32 5, i32 6, i32 7>
> +  %5 = shufflevector <8 x float> %b, <8 x float> undef, <4 x i32> <i32 4,
i32 5, i32 6, i32 7>
> +  %6 = shufflevector <8 x float> %c, <8 x float> undef, <4 x i32> <i32 4,
i32 5, i32 6, i32 7>
> +  %call.i2 = tail call intel_ocl_bicc <4 x float>  <at> func_float4(<4 x
float> %4, <4 x float> %5, <4 x float> %6) nounwind
> +  %7 = shufflevector <4 x float> %call.i2, <4 x float> undef, <8 x i32>
<i32 0, i32 1, i32 2, i32 3, i32 undef, i32
> undef, i32 undef, i32 undef>
> +  %8 = shufflevector <8 x float> %3, <8 x float> %7, <8 x i32> <i32 0,
i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
> +  ret <8 x float> %8
> +}
> +
> 

Hi Elena,
I would like to discuss with you about your commit r175043.

In X86VZeroUpper.cpp, how does clobbersAllYmmRegs() check that
YMM registers are preserved? It seems that this function is only
checking that at least one YMM register is not written/clobbered.

In avx-intel-ocl.ll, why does X32 have different check sequence
than X64 and WIN64? It seems that test_float4() is trying to
create some undef values in the upper 128 bits of a YMM register
with the following IR instructions:

+  %0 = shufflevector <8 x float> %a, <8 x float> undef, <4 x i32> <i32 0,
i32 1, i32 2, i32 3>
+  %1 = shufflevector <8 x float> %b, <8 x float> undef, <4 x i32> <i32 0,
i32 1, i32 2, i32 3>
+  %2 = shufflevector <8 x float> %c, <8 x float> undef, <4 x i32> <i32 0,
i32 1, i32 2, i32 3>

But the X86 backend is generating AVX vmovups instructions for
both X32 and X64 cases, which will clear the upper 128 bits instead
of leaving undef values there. So it seems that vzeroupper is not
needed in either case, right?

Thanks,
- Gao.