<div dir="ltr">We also discussed that when upstreaming, we should probably also do this for any btver2, instead of restricting to PS4.<div><br></div><div>-- Sean Silva</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 2, 2016 at 1:39 PM, Yunzhong Gao via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: ygao<br>
Date: Tue Feb  2 15:39:23 2016<br>
New Revision: 259576<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=259576&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=259576&view=rev</a><br>
Log:<br>
Disable the vzeroupper insertion pass on PS4.<br>
See comments in test/CodeGen/X86/avx-vzeroupper.ll for more explanation.<br>
<br>
Original patch by: Sean Silva<br>
<br>
<br>
Modified:<br>
    llvm/trunk/lib/Target/X86/X86TargetMachine.cpp<br>
    llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll<br>
<br>
Modified: llvm/trunk/lib/Target/X86/X86TargetMachine.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetMachine.cpp?rev=259576&r1=259575&r2=259576&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetMachine.cpp?rev=259576&r1=259575&r2=259576&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/lib/Target/X86/X86TargetMachine.cpp (original)<br>
+++ llvm/trunk/lib/Target/X86/X86TargetMachine.cpp Tue Feb  2 15:39:23 2016<br>
@@ -270,6 +270,9 @@ void X86PassConfig::addPreEmitPass() {<br>
   if (getOptLevel() != CodeGenOpt::None)<br>
     addPass(createExecutionDependencyFixPass(&X86::VR128RegClass));<br>
<br>
+  if (TM->getTargetTriple().isPS4CPU())<br>
+    UseVZeroUpper = false;<br>
+<br>
   if (UseVZeroUpper)<br>
     addPass(createX86IssueVZeroUpperPass());<br>
<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll?rev=259576&r1=259575&r2=259576&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll?rev=259576&r1=259575&r2=259576&view=diff</a><br>
==============================================================================<br>
--- llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll Tue Feb  2 15:39:23 2016<br>
@@ -1,4 +1,13 @@<br>
 ; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-apple-darwin -mattr=+avx | FileCheck %s<br>
+; RUN: llc < %s -mtriple=x86_64-scei-ps4 -mattr=+avx | FileCheck --check-prefix=PS4 %s<br>
+<br>
+; The Jaguar (AMD Family 16h) cores in the PS4 don't benefit from vzeroupper.<br>
+; At most, the benefit is "garbage collecting" def'd upper parts of the ymm<br>
+; registers, but the core has so many FP phys regs that this benefit of freeing<br>
+; up the upper parts is for now not worth it. Unlike Intel, there is no<br>
+; performance hazard to def'ing the lower parts of a ymm without clearing the<br>
+; upper part.<br>
+; PS4-NOT: vzeroupper<br>
<br>
 declare i32 @foo()<br>
 declare <4 x float> @do_sse(<4 x float>)<br>
<br>
<br>
_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>
</blockquote></div><br></div>