[llvm] r259576 - Disable the vzeroupper insertion pass on PS4.
Hal Finkel via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 2 17:15:13 PST 2016
----- Original Message -----
> From: "Yunzhong Gao via llvm-commits" <llvm-commits at lists.llvm.org>
> To: llvm-commits at lists.llvm.org
> Sent: Tuesday, February 2, 2016 3:39:24 PM
> Subject: [llvm] r259576 - Disable the vzeroupper insertion pass on PS4.
>
> Author: ygao
> Date: Tue Feb 2 15:39:23 2016
> New Revision: 259576
>
> URL: http://llvm.org/viewvc/llvm-project?rev=259576&view=rev
> Log:
> Disable the vzeroupper insertion pass on PS4.
> See comments in test/CodeGen/X86/avx-vzeroupper.ll for more
> explanation.
Please revert this change. This is not the right way to do this. Please add an appropriate target feature in lib/Target/X86/X86.td (and associated flag in X86Subtarget), set that feature on the associated ProcessorModel, and then add code in X86VZeroUpper.cpp to skip functions when the associated X86Subtarget flag is set. That is:
bool VZeroUpperInserter::runOnMachineFunction(MachineFunction &MF) {
const X86Subtarget &ST = MF.getSubtarget<X86Subtarget>();
if (!ST.hasAVX() || ST.hasAVX512())
return false;
make this condition more complicated.
Thanks in advance,
Hal
>
> Original patch by: Sean Silva
>
>
> Modified:
> llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
> llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll
>
> Modified: llvm/trunk/lib/Target/X86/X86TargetMachine.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetMachine.cpp?rev=259576&r1=259575&r2=259576&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86TargetMachine.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86TargetMachine.cpp Tue Feb 2
> 15:39:23 2016
> @@ -270,6 +270,9 @@ void X86PassConfig::addPreEmitPass() {
> if (getOptLevel() != CodeGenOpt::None)
> addPass(createExecutionDependencyFixPass(&X86::VR128RegClass));
>
> + if (TM->getTargetTriple().isPS4CPU())
> + UseVZeroUpper = false;
> +
> if (UseVZeroUpper)
> addPass(createX86IssueVZeroUpperPass());
>
>
> Modified: llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll?rev=259576&r1=259575&r2=259576&view=diff
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/avx-vzeroupper.ll Tue Feb 2 15:39:23
> 2016
> @@ -1,4 +1,13 @@
> ; RUN: llc < %s -x86-use-vzeroupper -mtriple=x86_64-apple-darwin
> -mattr=+avx | FileCheck %s
> +; RUN: llc < %s -mtriple=x86_64-scei-ps4 -mattr=+avx | FileCheck
> --check-prefix=PS4 %s
> +
> +; The Jaguar (AMD Family 16h) cores in the PS4 don't benefit from
> vzeroupper.
> +; At most, the benefit is "garbage collecting" def'd upper parts of
> the ymm
> +; registers, but the core has so many FP phys regs that this benefit
> of freeing
> +; up the upper parts is for now not worth it. Unlike Intel, there is
> no
> +; performance hazard to def'ing the lower parts of a ymm without
> clearing the
> +; upper part.
> +; PS4-NOT: vzeroupper
>
> declare i32 @foo()
> declare <4 x float> @do_sse(<4 x float>)
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-commits
mailing list