[llvm] r343452 - [X86][BtVer2] Teach how to identify zero-idiom VPERM2F128rr instructions.
Andrea Di Biagio via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 1 03:35:13 PDT 2018
Author: adibiagio
Date: Mon Oct 1 03:35:13 2018
New Revision: 343452
URL: http://llvm.org/viewvc/llvm-project?rev=343452&view=rev
Log:
[X86][BtVer2] Teach how to identify zero-idiom VPERM2F128rr instructions.
This patch adds another variant class to identify zero-idiom VPERM2F128rr
instructions.
On Jaguar, a VPERM wih bit 3 and 7 of the mask set, is a zero-idiom.
Differential Revision: https://reviews.llvm.org/D52663
Modified:
llvm/trunk/lib/Target/X86/X86SchedPredicates.td
llvm/trunk/lib/Target/X86/X86ScheduleBtVer2.td
llvm/trunk/test/tools/llvm-mca/X86/BtVer2/zero-idioms-avx-256.s
Modified: llvm/trunk/lib/Target/X86/X86SchedPredicates.td
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86SchedPredicates.td?rev=343452&r1=343451&r2=343452&view=diff
==============================================================================
--- llvm/trunk/lib/Target/X86/X86SchedPredicates.td (original)
+++ llvm/trunk/lib/Target/X86/X86SchedPredicates.td Mon Oct 1 03:35:13 2018
@@ -19,6 +19,13 @@
// different zero-idioms.
def ZeroIdiomPredicate : CheckSameRegOperand<1, 2>;
+// A predicate used to identify VPERM that have bits 3 and 7 of their mask set.
+// On some processors, these VPERM instructions are zero-idioms.
+def ZeroIdiomVPERMPredicate : CheckAll<[
+ ZeroIdiomPredicate,
+ CheckImmOperand<3, 0x88>
+]>;
+
// A predicate used to check if a LEA instruction uses all three source
// operands: base, index, and offset.
def IsThreeOperandsLEAPredicate: CheckAll<[
Modified: llvm/trunk/lib/Target/X86/X86ScheduleBtVer2.td
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ScheduleBtVer2.td?rev=343452&r1=343451&r2=343452&view=diff
==============================================================================
--- llvm/trunk/lib/Target/X86/X86ScheduleBtVer2.td (original)
+++ llvm/trunk/lib/Target/X86/X86ScheduleBtVer2.td Mon Oct 1 03:35:13 2018
@@ -688,6 +688,12 @@ def : InstRW<[JWriteVZeroIdiomALUX], (in
PCMPGTQrr, VPCMPGTQrr,
PCMPGTWrr, VPCMPGTWrr)>;
+def JWriteVPERM2F128 : SchedWriteVariant<[
+ SchedVar<MCSchedPredicate<ZeroIdiomVPERMPredicate>, [JWriteZeroIdiomYmm]>,
+ SchedVar<NoSchedPred, [WriteFShuffle256]>
+]>;
+def : InstRW<[JWriteVPERM2F128], (instrs VPERM2F128rr)>;
+
// This write is used for slow LEA instructions.
def JWrite3OpsLEA : SchedWriteRes<[JALU1, JSAGU]> {
let Latency = 2;
@@ -762,7 +768,9 @@ def : IsZeroIdiomFunction<[
// ymm variants.
VXORPSYrr, VXORPDYrr, VANDNPSYrr, VANDNPDYrr
- ], ZeroIdiomPredicate>
+ ], ZeroIdiomPredicate>,
+
+ DepBreakingClass<[ VPERM2F128rr ], ZeroIdiomVPERMPredicate>
]>;
def : IsDepBreakingFunction<[
Modified: llvm/trunk/test/tools/llvm-mca/X86/BtVer2/zero-idioms-avx-256.s
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/tools/llvm-mca/X86/BtVer2/zero-idioms-avx-256.s?rev=343452&r1=343451&r2=343452&view=diff
==============================================================================
--- llvm/trunk/test/tools/llvm-mca/X86/BtVer2/zero-idioms-avx-256.s (original)
+++ llvm/trunk/test/tools/llvm-mca/X86/BtVer2/zero-idioms-avx-256.s Mon Oct 1 03:35:13 2018
@@ -330,12 +330,12 @@ vaddps %ymm1, %ymm1, %ymm0
# CHECK: Iterations: 100
# CHECK-NEXT: Instructions: 200
-# CHECK-NEXT: Total Cycles: 403
+# CHECK-NEXT: Total Cycles: 205
# CHECK-NEXT: Total uOps: 400
# CHECK: Dispatch Width: 2
-# CHECK-NEXT: uOps Per Cycle: 0.99
-# CHECK-NEXT: IPC: 0.50
+# CHECK-NEXT: uOps Per Cycle: 1.95
+# CHECK-NEXT: IPC: 0.98
# CHECK-NEXT: Block RThroughput: 2.0
# CHECK: Instruction Info:
@@ -347,7 +347,7 @@ vaddps %ymm1, %ymm1, %ymm0
# CHECK-NEXT: [6]: HasSideEffects (U)
# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
-# CHECK-NEXT: 2 1 1.00 vperm2f128 $136, %ymm0, %ymm0, %ymm1
+# CHECK-NEXT: 2 1 0.50 vperm2f128 $136, %ymm0, %ymm0, %ymm1
# CHECK-NEXT: 2 3 2.00 vaddps %ymm1, %ymm1, %ymm0
# CHECK: Resources:
@@ -368,23 +368,23 @@ vaddps %ymm1, %ymm1, %ymm0
# CHECK: Resource pressure per iteration:
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
-# CHECK-NEXT: - - - 2.00 2.00 2.00 2.00 - - - - - - -
+# CHECK-NEXT: - - - 2.00 1.00 2.00 1.00 - - - - - - -
# CHECK: Resource pressure by instruction:
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Instructions:
-# CHECK-NEXT: - - - - 2.00 - 2.00 - - - - - - - vperm2f128 $136, %ymm0, %ymm0, %ymm1
+# CHECK-NEXT: - - - - 1.00 - 1.00 - - - - - - - vperm2f128 $136, %ymm0, %ymm0, %ymm1
# CHECK-NEXT: - - - 2.00 - 2.00 - - - - - - - - vaddps %ymm1, %ymm1, %ymm0
# CHECK: Timeline view:
-# CHECK-NEXT: 01234
+# CHECK-NEXT: 0
# CHECK-NEXT: Index 0123456789
-# CHECK: [0,0] DeER . . . vperm2f128 $136, %ymm0, %ymm0, %ymm1
-# CHECK-NEXT: [0,1] .DeeeER . . vaddps %ymm1, %ymm1, %ymm0
-# CHECK-NEXT: [1,0] . D==eER . . vperm2f128 $136, %ymm0, %ymm0, %ymm1
-# CHECK-NEXT: [1,1] . D==eeeER . vaddps %ymm1, %ymm1, %ymm0
-# CHECK-NEXT: [2,0] . D====eER . vperm2f128 $136, %ymm0, %ymm0, %ymm1
-# CHECK-NEXT: [2,1] . D====eeeER vaddps %ymm1, %ymm1, %ymm0
+# CHECK: [0,0] DeER . . vperm2f128 $136, %ymm0, %ymm0, %ymm1
+# CHECK-NEXT: [0,1] .DeeeER . vaddps %ymm1, %ymm1, %ymm0
+# CHECK-NEXT: [1,0] . DeE-R . vperm2f128 $136, %ymm0, %ymm0, %ymm1
+# CHECK-NEXT: [1,1] . DeeeER . vaddps %ymm1, %ymm1, %ymm0
+# CHECK-NEXT: [2,0] . DeE-R . vperm2f128 $136, %ymm0, %ymm0, %ymm1
+# CHECK-NEXT: [2,1] . DeeeER vaddps %ymm1, %ymm1, %ymm0
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -393,5 +393,5 @@ vaddps %ymm1, %ymm1, %ymm0
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
# CHECK: [0] [1] [2] [3]
-# CHECK-NEXT: 0. 3 3.0 0.3 0.0 vperm2f128 $136, %ymm0, %ymm0, %ymm1
-# CHECK-NEXT: 1. 3 3.0 0.0 0.0 vaddps %ymm1, %ymm1, %ymm0
+# CHECK-NEXT: 0. 3 1.0 1.0 0.7 vperm2f128 $136, %ymm0, %ymm0, %ymm1
+# CHECK-NEXT: 1. 3 1.0 0.0 0.0 vaddps %ymm1, %ymm1, %ymm0
More information about the llvm-commits
mailing list