[llvm] r259576 - Disable the vzeroupper insertion pass on PS4.

Hal Finkel via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 2 17:15:13 PST 2016


----- Original Message -----
> From: "Yunzhong Gao via llvm-commits" <llvm-commits at lists.llvm.org>
> To: llvm-commits at lists.llvm.org
> Sent: Tuesday, February 2, 2016 3:39:24 PM
> Subject: [llvm] r259576 - Disable the vzeroupper insertion pass on PS4.
> 
> Author: ygao
> Date: Tue Feb  2 15:39:23 2016
> New Revision: 259576
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=259576&view=rev
> Log:
> Disable the vzeroupper insertion pass on PS4.
> See comments in test/CodeGen/X86/avx-vzeroupper.ll for more
> explanation.

Please revert this change. This is not the right way to do this. Please add an appropriate target feature in lib/Target/X86/X86.td (and associated flag in X86Subtarget), set that feature on the associated ProcessorModel, and then add code in X86VZeroUpper.cpp to skip functions when the associated X86Subtarget flag is set. That is:

bool VZeroUpperInserter::runOnMachineFunction(MachineFunction &MF) {
  const X86Subtarget &ST = MF.getSubtarget<X86Subtarget>();
  if (!ST.hasAVX() || ST.hasAVX512())
    return false;

make this condition more complicated.

Thanks in advance,
Hal

> 
> Original patch by: Sean Silva
> 
> 
> Modified:
>     llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
>     llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll
> 
> Modified: llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetMachine.cpp?rev=259576&r1=259575&r2=259576&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86TargetMachine.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86TargetMachine.cpp Tue Feb  2
> 15:39:23 2016
> @@ -270,6 +270,9 @@ void X86PassConfig::addPreEmitPass() {
>    if (getOptLevel() != CodeGenOpt::None)
>      addPass(createExecutionDependencyFixPass(&X86::VR128RegClass));
>  
> +  if (TM->getTargetTriple().isPS4CPU())
> +    UseVZeroUpper = false;
> +
>    if (UseVZeroUpper)
>      addPass(createX86IssueVZeroUpperPass());
>  
> 
> Modified: llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll?rev=259576&r1=259575&r2=259576&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll Tue Feb  2 15:39:23
> 2016
> @@ -1,4 +1,13 @@
>  ; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-apple-darwin
>  -mattr=+avx | FileCheck %s
> +; RUN: llc < %s -mtriple=x86_64-scei-ps4 -mattr=+avx | FileCheck
> --check-prefix=PS4 %s
> +
> +; The Jaguar (AMD Family 16h) cores in the PS4 don't benefit from
> vzeroupper.
> +; At most, the benefit is "garbage collecting" def'd upper parts of
> the ymm
> +; registers, but the core has so many FP phys regs that this benefit
> of freeing
> +; up the upper parts is for now not worth it. Unlike Intel, there is
> no
> +; performance hazard to def'ing the lower parts of a ymm without
> clearing the
> +; upper part.
> +; PS4-NOT: vzeroupper
>  
>  declare i32 @foo()
>  declare <4 x float> @do_sse(<4 x float>)
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory


More information about the llvm-commits mailing list