[llvm] r259576 - Disable the vzeroupper insertion pass on PS4.

Yunzhong Gao via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 2 13:39:24 PST 2016


Author: ygao
Date: Tue Feb  2 15:39:23 2016
New Revision: 259576

URL: http://llvm.org/viewvc/llvm-project?rev=259576&view=rev
Log:
Disable the vzeroupper insertion pass on PS4.
See comments in test/CodeGen/X86/avx-vzeroupper.ll for more explanation.

Original patch by: Sean Silva


Modified:
    llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
    llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll

Modified: llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetMachine.cpp?rev=259576&r1=259575&r2=259576&view=diff
==============================================================================
--- llvm/trunk/lib/Target/X86/X86TargetMachine.cpp (original)
+++ llvm/trunk/lib/Target/X86/X86TargetMachine.cpp Tue Feb  2 15:39:23 2016
@@ -270,6 +270,9 @@ void X86PassConfig::addPreEmitPass() {
   if (getOptLevel() != CodeGenOpt::None)
     addPass(createExecutionDependencyFixPass(&X86::VR128RegClass));
 
+  if (TM->getTargetTriple().isPS4CPU())
+    UseVZeroUpper = false;
+
   if (UseVZeroUpper)
     addPass(createX86IssueVZeroUpperPass());
 

Modified: llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll?rev=259576&r1=259575&r2=259576&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll (original)
+++ llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll Tue Feb  2 15:39:23 2016
@@ -1,4 +1,13 @@
 ; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-apple-darwin -mattr=+avx | FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-scei-ps4 -mattr=+avx | FileCheck --check-prefix=PS4 %s
+
+; The Jaguar (AMD Family 16h) cores in the PS4 don't benefit from vzeroupper.
+; At most, the benefit is "garbage collecting" def'd upper parts of the ymm
+; registers, but the core has so many FP phys regs that this benefit of freeing
+; up the upper parts is for now not worth it. Unlike Intel, there is no
+; performance hazard to def'ing the lower parts of a ymm without clearing the
+; upper part.
+; PS4-NOT: vzeroupper
 
 declare i32 @foo()
 declare <4 x float> @do_sse(<4 x float>)




More information about the llvm-commits mailing list