[llvm] r304540 - [X86] Don't fold into memory operands into insertps in the generated folding tables.
Benjamin Kramer via llvm-commits
llvm-commits at lists.llvm.org
Fri Jun 2 03:50:22 PDT 2017
Author: d0k
Date: Fri Jun 2 05:50:22 2017
New Revision: 304540
URL: http://llvm.org/viewvc/llvm-project?rev=304540&view=rev
Log:
[X86] Don't fold into memory operands into insertps in the generated folding tables.
insertps behaves differently, the register form selects from an input
register based on the immediate operand while the memory form just loads
the given address. We have custom code to change the immediate in cases
where that's legal, so completely remove insertps from the generated
tables.
Modified:
llvm/trunk/test/CodeGen/X86/stack-folding-fp-avx1.ll
llvm/trunk/utils/TableGen/X86FoldTablesEmitter.cpp
Modified: llvm/trunk/test/CodeGen/X86/stack-folding-fp-avx1.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/stack-folding-fp-avx1.ll?rev=304540&r1=304539&r2=304540&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/stack-folding-fp-avx1.ll (original)
+++ llvm/trunk/test/CodeGen/X86/stack-folding-fp-avx1.ll Fri Jun 2 05:50:22 2017
@@ -1926,5 +1926,19 @@ define <8 x float> @stack_fold_xorps_ymm
ret <8 x float> %6
}
+define <4 x float> @stack_nofold_insertps(<8 x float> %a0, <8 x float> %a1) {
+; Cannot fold this without changing the immediate.
+; CHECK-LABEL: stack_nofold_insertps
+; CHECK: 32-byte Spill
+; CHECK: nop
+; CHECK: 32-byte Reload
+; CHECK: vinsertps $179, {{%xmm., %xmm., %xmm.}}
+ %1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm2},~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{flags}"()
+ %v0 = shufflevector <8 x float> %a0, <8 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+ %v1 = shufflevector <8 x float> %a1, <8 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
+ %res = call <4 x float> @llvm.x86.sse41.insertps(<4 x float> %v0, <4 x float> %v1, i8 179)
+ ret <4 x float> %res
+}
+
attributes #0 = { "unsafe-fp-math"="false" }
attributes #1 = { "unsafe-fp-math"="true" }
Modified: llvm/trunk/utils/TableGen/X86FoldTablesEmitter.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/utils/TableGen/X86FoldTablesEmitter.cpp?rev=304540&r1=304539&r2=304540&view=diff
==============================================================================
--- llvm/trunk/utils/TableGen/X86FoldTablesEmitter.cpp (original)
+++ llvm/trunk/utils/TableGen/X86FoldTablesEmitter.cpp Fri Jun 2 05:50:22 2017
@@ -101,6 +101,11 @@ const char *const NoFoldSet[] = {
"BTS16rr", "BTS32rr", "BTS64rr",
"BTS16mr", "BTS32mr", "BTS64mr",
+ // insertps cannot be folded without adjusting the immediate. There's custom
+ // code to handle it in X86InstrInfo.cpp, ignore it here.
+ "INSERTPSrr", "INSERTPSrm",
+ "VINSERTPSrr", "VINSERTPSrm", "VINSERTPSZrr", "VINSERTPSZrm",
+
// Memory folding is enabled only when optimizing for size by DAG
// patterns only. (issue detailed in D28744 review)
"VCVTSS2SDrm", "VCVTSS2SDrr",
More information about the llvm-commits
mailing list