[llvm] r259576 - Disable the vzeroupper insertion pass on PS4.
Yunzhong Gao via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 2 13:39:24 PST 2016
Author: ygao
Date: Tue Feb 2 15:39:23 2016
New Revision: 259576
URL: http://llvm.org/viewvc/llvm-project?rev=259576&view=rev
Log:
Disable the vzeroupper insertion pass on PS4.
See comments in test/CodeGen/X86/avx-vzeroupper.ll for more explanation.
Original patch by: Sean Silva
Modified:
llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll
Modified: llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetMachine.cpp?rev=259576&r1=259575&r2=259576&view=diff
==============================================================================
--- llvm/trunk/lib/Target/X86/X86TargetMachine.cpp (original)
+++ llvm/trunk/lib/Target/X86/X86TargetMachine.cpp Tue Feb 2 15:39:23 2016
@@ -270,6 +270,9 @@ void X86PassConfig::addPreEmitPass() {
if (getOptLevel() != CodeGenOpt::None)
addPass(createExecutionDependencyFixPass(&X86::VR128RegClass));
+ if (TM->getTargetTriple().isPS4CPU())
+ UseVZeroUpper = false;
+
if (UseVZeroUpper)
addPass(createX86IssueVZeroUpperPass());
Modified: llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll?rev=259576&r1=259575&r2=259576&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll (original)
+++ llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll Tue Feb 2 15:39:23 2016
@@ -1,4 +1,13 @@
; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-apple-darwin -mattr=+avx | FileCheck %s
+; RUN: llc < %s -mtriple=x86_64-scei-ps4 -mattr=+avx | FileCheck --check-prefix=PS4 %s
+
+; The Jaguar (AMD Family 16h) cores in the PS4 don't benefit from vzeroupper.
+; At most, the benefit is "garbage collecting" def'd upper parts of the ymm
+; registers, but the core has so many FP phys regs that this benefit of freeing
+; up the upper parts is for now not worth it. Unlike Intel, there is no
+; performance hazard to def'ing the lower parts of a ymm without clearing the
+; upper part.
+; PS4-NOT: vzeroupper
declare i32 @foo()
declare <4 x float> @do_sse(<4 x float>)
More information about the llvm-commits
mailing list