[llvm] aa0dcb3 - [X86] AMD Zen 3: same-reg SSE XMM XORPS is a 1-cycle(!) dep-breaking one-idiom
Roman Lebedev via llvm-commits
llvm-commits at lists.llvm.org
Thu May 13 14:03:57 PDT 2021
Author: Roman Lebedev
Date: 2021-05-14T00:03:36+03:00
New Revision: aa0dcb3ba4b93e4499208def080ced98f3a89ad5
URL: https://github.com/llvm/llvm-project/commit/aa0dcb3ba4b93e4499208def080ced98f3a89ad5
DIFF: https://github.com/llvm/llvm-project/commit/aa0dcb3ba4b93e4499208def080ced98f3a89ad5.diff
LOG: [X86] AMD Zen 3: same-reg SSE XMM XORPS is a 1-cycle(!) dep-breaking one-idiom
While both the SOG and Agner insist that it is zero-cycle,
i can not confirm that claim. While it clearly breaks the dependency,
i can not come up with a snippet, or measurement approach,
to end up with IPC bigger than 4, which, to me, means that it actually
consumes execution resource of an FP unit for a cycle.
Added:
Modified:
llvm/lib/Target/X86/X86ScheduleZnver3.td
llvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-sse-xmm.s
Removed:
################################################################################
diff --git a/llvm/lib/Target/X86/X86ScheduleZnver3.td b/llvm/lib/Target/X86/X86ScheduleZnver3.td
index 571aedf15d4c8..82233b6dda97a 100644
--- a/llvm/lib/Target/X86/X86ScheduleZnver3.td
+++ b/llvm/lib/Target/X86/X86ScheduleZnver3.td
@@ -1536,6 +1536,9 @@ def : IsZeroIdiomFunction<[
XOR64rr, XOR64rr_REV,
SUB32rr, SUB32rr_REV,
SUB64rr, SUB64rr_REV ], ZeroIdiomPredicate>,
+
+ // SSE XMM Zero-idioms.
+ DepBreakingClass<[ XORPSrr ], ZeroIdiomPredicate>,
]>;
def : IsDepBreakingFunction<[
diff --git a/llvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-sse-xmm.s b/llvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-sse-xmm.s
index a7b848bd3a921..3eae26fdcab7e 100644
--- a/llvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-sse-xmm.s
+++ b/llvm/test/tools/llvm-mca/X86/Znver3/zero-idioms-sse-xmm.s
@@ -10,12 +10,12 @@ xorps %xmm0, %xmm1
# CHECK: Iterations: 10000
# CHECK-NEXT: Instructions: 20000
-# CHECK-NEXT: Total Cycles: 20003
+# CHECK-NEXT: Total Cycles: 5004
# CHECK-NEXT: Total uOps: 20000
# CHECK: Dispatch Width: 6
-# CHECK-NEXT: uOps Per Cycle: 1.00
-# CHECK-NEXT: IPC: 1.00
+# CHECK-NEXT: uOps Per Cycle: 4.00
+# CHECK-NEXT: IPC: 4.00
# CHECK-NEXT: Block RThroughput: 0.5
# CHECK: Instruction Info:
@@ -31,13 +31,13 @@ xorps %xmm0, %xmm1
# CHECK-NEXT: 1 1 0.25 xorps %xmm0, %xmm1
# CHECK: Register File statistics:
-# CHECK-NEXT: Total number of mappings created: 20000
-# CHECK-NEXT: Max number of mappings used: 66
+# CHECK-NEXT: Total number of mappings created: 10000
+# CHECK-NEXT: Max number of mappings used: 37
# CHECK: * Register File #1 -- Zn3FpPRF:
# CHECK-NEXT: Number of physical registers: 160
-# CHECK-NEXT: Total number of mappings created: 20000
-# CHECK-NEXT: Max number of mappings used: 66
+# CHECK-NEXT: Total number of mappings created: 10000
+# CHECK-NEXT: Max number of mappings used: 37
# CHECK: * Register File #2 -- Zn3IntegerPRF:
# CHECK-NEXT: Number of physical registers: 192
@@ -75,16 +75,16 @@ xorps %xmm0, %xmm1
# CHECK: Resource pressure by instruction:
# CHECK-NEXT: [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12.0] [12.1] [13] [14.0] [14.1] [14.2] [15.0] [15.1] [15.2] [16.0] [16.1] Instructions:
-# CHECK-NEXT: - - - - - - - - - 0.50 - 0.50 - - - - - - - - - - - xorps %xmm1, %xmm1
-# CHECK-NEXT: - - - - - - - - 0.50 - 0.50 - - - - - - - - - - - - xorps %xmm0, %xmm1
+# CHECK-NEXT: - - - - - - - - - 0.50 0.25 0.25 - - - - - - - - - - - xorps %xmm1, %xmm1
+# CHECK-NEXT: - - - - - - - - 0.50 - 0.25 0.25 - - - - - - - - - - - xorps %xmm0, %xmm1
# CHECK: Timeline view:
-# CHECK-NEXT: Index 0123456
+# CHECK-NEXT: Index 01234
-# CHECK: [0,0] DeER .. xorps %xmm1, %xmm1
-# CHECK-NEXT: [0,1] D=eER.. xorps %xmm0, %xmm1
-# CHECK-NEXT: [1,0] D==eER. xorps %xmm1, %xmm1
-# CHECK-NEXT: [1,1] D===eER xorps %xmm0, %xmm1
+# CHECK: [0,0] DeER. xorps %xmm1, %xmm1
+# CHECK-NEXT: [0,1] D=eER xorps %xmm0, %xmm1
+# CHECK-NEXT: [1,0] DeE-R xorps %xmm1, %xmm1
+# CHECK-NEXT: [1,1] D=eER xorps %xmm0, %xmm1
# CHECK: Average Wait times (based on the timeline view):
# CHECK-NEXT: [0]: Executions
@@ -93,6 +93,6 @@ xorps %xmm0, %xmm1
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
# CHECK: [0] [1] [2] [3]
-# CHECK-NEXT: 0. 2 2.0 0.5 0.0 xorps %xmm1, %xmm1
-# CHECK-NEXT: 1. 2 3.0 0.0 0.0 xorps %xmm0, %xmm1
-# CHECK-NEXT: 2 2.5 0.3 0.0 <total>
+# CHECK-NEXT: 0. 2 1.0 1.0 0.5 xorps %xmm1, %xmm1
+# CHECK-NEXT: 1. 2 2.0 0.0 0.0 xorps %xmm0, %xmm1
+# CHECK-NEXT: 2 1.5 0.5 0.3 <total>
More information about the llvm-commits
mailing list