[llvm] [AArch64] Add Ampere1B scheduling/pipeline model (PR #81341)
via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 9 16:20:01 PST 2024
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-aarch64
Author: Philipp Tomsich (ptomsich)
<details>
<summary>Changes</summary>
The Ampere1B core is enabled with a new scheduling/pipeline model, as it provides significant updates over the Ampere1 core; it reduces latencies on many instructions, has some micro-ops reassigned between the XY and X units, and provides modelling for the instructions added since Ampere1 and Ampere1A.
This is identical to the reverted #<!-- -->81338, but has the missing ',' added and was retested after a 'ninja clean'.
---
Patch is 44.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81341.diff
2 Files Affected:
- (modified) llvm/lib/Target/AArch64/AArch64.td (+2-1)
- (added) llvm/lib/Target/AArch64/AArch64SchedAmpere1B.td (+1063)
``````````diff
diff --git a/llvm/lib/Target/AArch64/AArch64.td b/llvm/lib/Target/AArch64/AArch64.td
index e76204f5522515..156c48e02abf9c 100644
--- a/llvm/lib/Target/AArch64/AArch64.td
+++ b/llvm/lib/Target/AArch64/AArch64.td
@@ -837,6 +837,7 @@ include "AArch64SchedA64FX.td"
include "AArch64SchedThunderX3T110.td"
include "AArch64SchedTSV110.td"
include "AArch64SchedAmpere1.td"
+include "AArch64SchedAmpere1B.td"
include "AArch64SchedNeoverseN1.td"
include "AArch64SchedNeoverseN2.td"
include "AArch64SchedNeoverseV1.td"
@@ -1722,7 +1723,7 @@ def : ProcessorModel<"ampere1", Ampere1Model, ProcessorFeatures.Ampere1,
def : ProcessorModel<"ampere1a", Ampere1Model, ProcessorFeatures.Ampere1A,
[TuneAmpere1A]>;
-def : ProcessorModel<"ampere1b", Ampere1Model, ProcessorFeatures.Ampere1B,
+def : ProcessorModel<"ampere1b", Ampere1BModel, ProcessorFeatures.Ampere1B,
[TuneAmpere1B]>;
//===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Target/AArch64/AArch64SchedAmpere1B.td b/llvm/lib/Target/AArch64/AArch64SchedAmpere1B.td
new file mode 100644
index 00000000000000..28c93c2b06aa11
--- /dev/null
+++ b/llvm/lib/Target/AArch64/AArch64SchedAmpere1B.td
@@ -0,0 +1,1063 @@
+//=- AArch64SchedAmpere1B.td - Ampere-1B scheduling def -----*- tablegen -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the machine model for the Ampere Computing Ampere-1B to
+// support instruction scheduling and other instruction cost heuristics.
+//
+//===----------------------------------------------------------------------===//
+
+// The Ampere-1 core is an out-of-order micro-architecture. The front
+// end has branch prediction, with a 10-cycle recovery time from a
+// mispredicted branch. Instructions coming out of the front end are
+// decoded into internal micro-ops (uops).
+
+def Ampere1BModel : SchedMachineModel {
+ let IssueWidth = 4; // 4-way decode and dispatch
+ let MicroOpBufferSize = 192; // micro-op re-order buffer size
+ let LoadLatency = 3; // Optimistic load latency
+ let MispredictPenalty = 10; // Branch mispredict penalty
+ let LoopMicroOpBufferSize = 32; // Instruction queue size
+ let CompleteModel = 0;
+
+ list<Predicate> UnsupportedFeatures = !listconcat(SVEUnsupported.F,
+ SMEUnsupported.F,
+ PAUnsupported.F);
+}
+
+let SchedModel = Ampere1BModel in {
+
+//===----------------------------------------------------------------------===//
+// Define each kind of processor resource and number available on Ampere-1.
+// Ampere-1 has 12 pipelines that 8 independent scheduler (4 integer, 2 FP,
+// and 2 memory) issue into. The integer and FP schedulers can each issue
+// one uop per cycle, while the memory schedulers can each issue one load
+// and one store address calculation per cycle.
+
+def Ampere1BUnitA : ProcResource<2>; // integer single-cycle, branch, and flags r/w
+def Ampere1BUnitB : ProcResource<2>; // integer single-cycle, and complex shifts
+def Ampere1BUnitBS : ProcResource<1>; // integer multi-cycle
+def Ampere1BUnitL : ProcResource<2>; // load
+def Ampere1BUnitS : ProcResource<2>; // store address calculation
+def Ampere1BUnitX : ProcResource<1>; // FP and vector operations, and flag write
+def Ampere1BUnitY : ProcResource<1>; // FP and vector operations, and crypto
+def Ampere1BUnitZ : ProcResource<1>; // FP store data and FP-to-integer moves
+
+def Ampere1BUnitAB : ProcResGroup<[Ampere1BUnitA, Ampere1BUnitB]>;
+def Ampere1BUnitXY : ProcResGroup<[Ampere1BUnitX, Ampere1BUnitY]>;
+
+//===----------------------------------------------------------------------===//
+// Define customized scheduler read/write types specific to the Ampere-1.
+
+def Ampere1BWrite_1cyc_1A : SchedWriteRes<[Ampere1BUnitA]> {
+ let Latency = 1;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_1cyc_2A : SchedWriteRes<[Ampere1BUnitA, Ampere1BUnitA]> {
+ let Latency = 1;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_1cyc_1B : SchedWriteRes<[Ampere1BUnitB]> {
+ let Latency = 1;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_1cyc_1BS : SchedWriteRes<[Ampere1BUnitBS]> {
+ let Latency = 1;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_1cyc_1BS_1B : SchedWriteRes<[Ampere1BUnitBS, Ampere1BUnitB]> {
+ let Latency = 1;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_1cyc_1AB : SchedWriteRes<[Ampere1BUnitAB]> {
+ let Latency = 1;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_1cyc_1AB_1A : SchedWriteRes<[Ampere1BUnitAB, Ampere1BUnitA]> {
+ let Latency = 1;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_1cyc_1L : SchedWriteRes<[Ampere1BUnitL]> {
+ let Latency = 1;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_1cyc_1S : SchedWriteRes<[Ampere1BUnitS]> {
+ let Latency = 1;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_1cyc_2S : SchedWriteRes<[Ampere1BUnitS, Ampere1BUnitS]> {
+ let Latency = 1;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_2cyc_1Y : SchedWriteRes<[Ampere1BUnitY]> {
+ let Latency = 2;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_2cyc_2AB : SchedWriteRes<[Ampere1BUnitAB, Ampere1BUnitAB]> {
+ let Latency = 2;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_2cyc_1B_1AB : SchedWriteRes<[Ampere1BUnitB, Ampere1BUnitAB]> {
+ let Latency = 2;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_2cyc_1B_1S : SchedWriteRes<[Ampere1BUnitB, Ampere1BUnitS]> {
+ let Latency = 2;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_2cyc_1B_1S_1AB : SchedWriteRes<[Ampere1BUnitB,
+ Ampere1BUnitS,
+ Ampere1BUnitAB]> {
+ let Latency = 2;
+ let NumMicroOps = 3;
+}
+
+def Ampere1BWrite_2cyc_1XY : SchedWriteRes<[Ampere1BUnitXY]> {
+ let Latency = 2;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_2cyc_1S_1Z : SchedWriteRes<[Ampere1BUnitS, Ampere1BUnitZ]> {
+ let Latency = 2;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_3cyc_1BS : SchedWriteRes<[Ampere1BUnitBS]> {
+ let Latency = 3;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_3cyc_1L : SchedWriteRes<[Ampere1BUnitL]> {
+ let Latency = 3;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_3cyc_1X : SchedWriteRes<[Ampere1BUnitX]> {
+ let Latency = 3;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_3cyc_1XY : SchedWriteRes<[Ampere1BUnitXY]> {
+ let Latency = 3;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_3cyc_1Z : SchedWriteRes<[Ampere1BUnitZ]> {
+ let Latency = 3;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_3cyc_1S_1Z : SchedWriteRes<[Ampere1BUnitS,
+ Ampere1BUnitZ]> {
+ let Latency = 3;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_3cyc_1S_2Z : SchedWriteRes<[Ampere1BUnitS,
+ Ampere1BUnitZ, Ampere1BUnitZ]> {
+ let Latency = 3;
+ let NumMicroOps = 3;
+}
+
+def Ampere1BWrite_3cyc_2S_2Z : SchedWriteRes<[Ampere1BUnitS, Ampere1BUnitS,
+ Ampere1BUnitZ, Ampere1BUnitZ]> {
+ let Latency = 3;
+ let NumMicroOps = 4;
+}
+
+def Ampere1BWrite_4cyc_1BS_1AB : SchedWriteRes<[Ampere1BUnitBS, Ampere1BUnitAB]> {
+ let Latency = 4;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_4cyc_1L : SchedWriteRes<[Ampere1BUnitL]> {
+ let Latency = 4;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_4cyc_1L_1B : SchedWriteRes<[Ampere1BUnitL, Ampere1BUnitB]> {
+ let Latency = 4;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_4cyc_1X : SchedWriteRes<[Ampere1BUnitX]> {
+ let Latency = 4;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_4cyc_1XY : SchedWriteRes<[Ampere1BUnitXY]> {
+ let Latency = 4;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_4cyc_2XY : SchedWriteRes<[Ampere1BUnitXY, Ampere1BUnitXY]> {
+ let Latency = 4;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_5cyc_1BS : SchedWriteRes<[Ampere1BUnitBS]> {
+ let Latency = 5;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_4cyc_1XY_1S_1Z : SchedWriteRes<[Ampere1BUnitXY,
+ Ampere1BUnitS,
+ Ampere1BUnitZ]> {
+ let Latency = 4;
+ let NumMicroOps = 3;
+}
+
+def Ampere1BWrite_5cyc_1BS : SchedWriteRes<[Ampere1BUnitBS]> {
+ let Latency = 5;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_5cyc_4S_4Z : SchedWriteRes<[Ampere1BUnitS, Ampere1BUnitS,
+ Ampere1BUnitS, Ampere1BUnitS,
+ Ampere1BUnitZ, Ampere1BUnitZ,
+ Ampere1BUnitZ, Ampere1BUnitZ]> {
+ let Latency = 5;
+ let NumMicroOps = 8;
+}
+
+def Ampere1BWrite_5cyc_1L_1BS : SchedWriteRes<[Ampere1BUnitL,
+ Ampere1BUnitBS]> {
+ let Latency = 5;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_5cyc_3L : SchedWriteRes<[Ampere1BUnitL,
+ Ampere1BUnitL,
+ Ampere1BUnitL]> {
+ let Latency = 5;
+ let NumMicroOps = 3;
+}
+
+def Ampere1BWrite_5cyc_4L : SchedWriteRes<[Ampere1BUnitL,
+ Ampere1BUnitL,
+ Ampere1BUnitL,
+ Ampere1BUnitL]> {
+ let Latency = 5;
+ let NumMicroOps = 4;
+}
+
+def Ampere1BWrite_5cyc_1X : SchedWriteRes<[Ampere1BUnitX]> {
+ let Latency = 5;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_5cyc_2XY_2S_2Z : SchedWriteRes<[Ampere1BUnitXY, Ampere1BUnitXY,
+ Ampere1BUnitS, Ampere1BUnitS,
+ Ampere1BUnitZ, Ampere1BUnitZ]> {
+ let Latency = 5;
+ let NumMicroOps = 6;
+}
+
+def Ampere1BWrite_6cyc_1BS_1A : SchedWriteRes<[Ampere1BUnitBS, Ampere1BUnitA]> {
+ let Latency = 6;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_6cyc_1BS_2A : SchedWriteRes<[Ampere1BUnitBS, Ampere1BUnitA,
+ Ampere1BUnitA]> {
+ let Latency = 6;
+ let NumMicroOps = 3;
+}
+
+def Ampere1BWrite_6cyc_1X : SchedWriteRes<[Ampere1BUnitX]> {
+ let Latency = 6;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_6cyc_2XY : SchedWriteRes<[Ampere1BUnitXY, Ampere1BUnitXY]> {
+ let Latency = 6;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_6cyc_3XY : SchedWriteRes<[Ampere1BUnitXY, Ampere1BUnitXY,
+ Ampere1BUnitXY]> {
+ let Latency = 6;
+ let NumMicroOps = 3;
+}
+
+def Ampere1BWrite_6cyc_2XY_2S_2Z : SchedWriteRes<[Ampere1BUnitXY, Ampere1BUnitXY,
+ Ampere1BUnitS, Ampere1BUnitS,
+ Ampere1BUnitZ, Ampere1BUnitZ]> {
+ let Latency = 6;
+ let NumMicroOps = 6;
+}
+
+def Ampere1BWrite_6cyc_3XY_3S_3Z : SchedWriteRes<[Ampere1BUnitXY, Ampere1BUnitXY, Ampere1BUnitXY,
+ Ampere1BUnitS, Ampere1BUnitS, Ampere1BUnitS,
+ Ampere1BUnitZ, Ampere1BUnitZ, Ampere1BUnitZ]> {
+ let Latency = 6;
+ let NumMicroOps = 9;
+}
+
+def Ampere1BWrite_7cyc_1BS_1XY : SchedWriteRes<[Ampere1BUnitBS, Ampere1BUnitXY]> {
+ let Latency = 7;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_7cyc_1XY_1Z : SchedWriteRes<[Ampere1BUnitXY, Ampere1BUnitZ]> {
+ let Latency = 7;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_7cyc_4XY_4S_4Z : SchedWriteRes<[Ampere1BUnitXY, Ampere1BUnitXY,
+ Ampere1BUnitXY, Ampere1BUnitXY,
+ Ampere1BUnitS, Ampere1BUnitS,
+ Ampere1BUnitS, Ampere1BUnitS,
+ Ampere1BUnitZ, Ampere1BUnitZ,
+ Ampere1BUnitZ, Ampere1BUnitZ]> {
+ let Latency = 7;
+ let NumMicroOps = 12;
+}
+
+def Ampere1BWrite_8cyc_1BS_1L : SchedWriteRes<[Ampere1BUnitBS, Ampere1BUnitL]> {
+ let Latency = 8;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_8cyc_1BS_1XY : SchedWriteRes<[Ampere1BUnitBS, Ampere1BUnitXY]> {
+ let Latency = 8;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_8cyc_2XY : SchedWriteRes<[Ampere1BUnitXY, Ampere1BUnitXY]> {
+ let Latency = 8;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_8cyc_4XY : SchedWriteRes<[Ampere1BUnitXY, Ampere1BUnitXY,
+ Ampere1BUnitXY, Ampere1BUnitXY]> {
+ let Latency = 8;
+ let NumMicroOps = 4;
+}
+
+def Ampere1BWrite_9cyc_6XY_4S_4Z : SchedWriteRes<[Ampere1BUnitXY, Ampere1BUnitXY,
+ Ampere1BUnitXY, Ampere1BUnitXY,
+ Ampere1BUnitXY, Ampere1BUnitXY,
+ Ampere1BUnitS, Ampere1BUnitS,
+ Ampere1BUnitS, Ampere1BUnitS,
+ Ampere1BUnitZ, Ampere1BUnitZ,
+ Ampere1BUnitZ, Ampere1BUnitZ]> {
+ let Latency = 9;
+ let NumMicroOps = 14;
+}
+
+def Ampere1BWrite_9cyc_1A_1BS_1X : SchedWriteRes<[Ampere1BUnitA, Ampere1BUnitBS, Ampere1BUnitX]> {
+ let Latency = 9;
+ let NumMicroOps = 3;
+}
+
+def Ampere1BWrite_9cyc_1A_1BS_1XY : SchedWriteRes<[Ampere1BUnitA, Ampere1BUnitBS, Ampere1BUnitXY]> {
+ let Latency = 9;
+ let NumMicroOps = 3;
+}
+
+def Ampere1BWrite_9cyc_1X : SchedWriteRes<[Ampere1BUnitX]> {
+ let Latency = 9;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_9cyc_3XY : SchedWriteRes<[Ampere1BUnitXY, Ampere1BUnitXY, Ampere1BUnitXY]> {
+ let Latency = 9;
+ let NumMicroOps = 3;
+}
+
+def Ampere1BWrite_11cyc_1BS_2XY : SchedWriteRes<[Ampere1BUnitBS, Ampere1BUnitXY, Ampere1BUnitXY]> {
+ let Latency = 11;
+ let NumMicroOps = 3;
+}
+
+def Ampere1BWrite_12cyc_1X : SchedWriteRes<[Ampere1BUnitX]> {
+ let Latency = 12;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_13cyc_1BS_1X : SchedWriteRes<[Ampere1BUnitBS, Ampere1BUnitX]> {
+ let Latency = 13;
+ let NumMicroOps = 2;
+}
+
+def Ampere1BWrite_17cyc_1X : SchedWriteRes<[Ampere1BUnitX]> {
+ let Latency = 17;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_19cyc_2BS_1X : SchedWriteRes<[Ampere1BUnitBS,
+ Ampere1BUnitBS,
+ Ampere1BUnitX]> {
+ let Latency = 13;
+ let NumMicroOps = 3;
+}
+
+def Ampere1BWrite_19cyc_1X : SchedWriteRes<[Ampere1BUnitX]> {
+ let Latency = 19;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_21cyc_1X : SchedWriteRes<[Ampere1BUnitX]> {
+ let Latency = 21;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_33cyc_1X : SchedWriteRes<[Ampere1BUnitX]> {
+ let Latency = 33;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_39cyc_1X : SchedWriteRes<[Ampere1BUnitX]> {
+ let Latency = 39;
+ let NumMicroOps = 1;
+}
+
+def Ampere1BWrite_63cyc_1X : SchedWriteRes<[Ampere1BUnitX]> {
+ let Latency = 63;
+ let NumMicroOps = 1;
+}
+
+// For basic arithmetic, we have more flexibility for short shifts (LSL shift <= 4),
+// which are a single uop, and for extended registers, which have full flexibility
+// across Unit A or B for both uops.
+def Ampere1BWrite_Arith : SchedWriteVariant<[
+ SchedVar<RegExtendedPred, [Ampere1Write_2cyc_2AB]>,
+ SchedVar<IsCheapLSL, [Ampere1Write_1cyc_1AB]>,
+ SchedVar<NoSchedPred, [Ampere1Write_2cyc_1B_1AB]>]>;
+
+def Ampere1BWrite_ArithFlagsetting : SchedWriteVariant<[
+ SchedVar<RegExtendedPred, [Ampere1Write_2cyc_2AB]>,
+ SchedVar<IsCheapLSL, [Ampere1Write_1cyc_1AB]>,
+ SchedVar<NoSchedPred, [Ampere1Write_2cyc_1B_1AB]>]>;
+
+//===----------------------------------------------------------------------===//
+// Map the target-defined scheduler read/write resources and latencies for Ampere-1.
+// This provides a coarse model, which is then specialised below.
+
+def : WriteRes<WriteImm, [Ampere1BUnitAB]>; // MOVN, MOVZ
+def : WriteRes<WriteI, [Ampere1BUnitAB]>; // ALU
+def : WriteRes<WriteISReg, [Ampere1BUnitB, Ampere1BUnitAB]> {
+ let Latency = 2;
+ let NumMicroOps = 2;
+} // ALU of Shifted-Reg
+def : WriteRes<WriteIEReg, [Ampere1BUnitAB, Ampere1BUnitAB]> {
+ let Latency = 2;
+ let NumMicroOps = 2;
+} // ALU of Extended-Reg
+def : WriteRes<WriteExtr, [Ampere1BUnitB]>; // EXTR shifts a reg pair
+def : WriteRes<WriteIS, [Ampere1BUnitB]>; // Shift/Scale
+def : WriteRes<WriteID32, [Ampere1BUnitBS, Ampere1BUnitX]> {
+ let Latency = 13;
+} // 32-bit Divide
+def : WriteRes<WriteID64, [Ampere1BUnitBS, Ampere1BUnitX]> {
+ let Latency = 19;
+} // 64-bit Divide
+def : WriteRes<WriteIM32, [Ampere1BUnitBS]> {
+ let Latency = 3;
+} // 32-bit Multiply
+def : WriteRes<WriteIM64, [Ampere1BUnitBS, Ampere1UnitAB]> {
+ let Latency = 3;
+} // 64-bit Multiply
+def : WriteRes<WriteBr, [Ampere1BUnitA]>;
+def : WriteRes<WriteBrReg, [Ampere1BUnitA, Ampere1UnitA]>;
+def : WriteRes<WriteLD, [Ampere1BUnitL]> {
+ let Latency = 3;
+} // Load from base addr plus immediate offset
+def : WriteRes<WriteST, [Ampere1BUnitS]> {
+ let Latency = 1;
+} // Store to base addr plus immediate offset
+def : WriteRes<WriteSTP, [Ampere1BUnitS, Ampere1BUnitS]> {
+ let Latency = 1;
+ let NumMicroOps = 1;
+} // Store a register pair.
+def : WriteRes<WriteAdr, [Ampere1BUnitAB]>;
+def : WriteRes<WriteLDIdx, [Ampere1BUnitAB, Ampere1BUnitS]> {
+ let Latency = 3;
+ let NumMicroOps = 1;
+} // Load from a register index (maybe scaled).
+def : WriteRes<WriteSTIdx, [Ampere1BUnitS, Ampere1BUnitS]> {
+ let Latency = 1;
+ let NumMicroOps = 2;
+} // Store to a register index (maybe scaled).
+def : WriteRes<WriteF, [Ampere1BUnitXY]> {
+ let Latency = 2;
+} // General floating-point ops.
+def : WriteRes<WriteFCmp, [Ampere1BUnitX]> {
+ let Latency = 3;
+} // Floating-point compare.
+def : WriteRes<WriteFCvt, [Ampere1BUnitXY]> {
+ let Latency = 3;
+} // Float conversion.
+def : WriteRes<WriteFCopy, [Ampere1BUnitXY]> {
+} // Float-int register copy.
+def : WriteRes<WriteFImm, [Ampere1BUnitXY]> {
+ let Latency = 2;
+} // Float-int register copy.
+def : WriteRes<WriteFMul, [Ampere1BUnitXY]> {
+ let Latency = 4;
+} // Floating-point multiply.
+def : WriteRes<WriteFDiv, [Ampere1BUnitXY]> {
+ let Latency = 19;
+} // Floating-point division.
+def : WriteRes<WriteVd, [Ampere1BUnitXY]> {
+ let Latency = 3;
+} // 64bit Vector D ops.
+def : WriteRes<WriteVq, [Ampere1BUnitXY]> {
+ let Latency = 3;
+} // 128bit Vector Q ops.
+def : WriteRes<WriteVLD, [Ampere1BUnitL, Ampere1BUnitL]> {
+ let Latency = 4;
+} // Vector loads.
+def : WriteRes<WriteVST, [Ampere1BUnitS, Ampere1BUnitZ]> {
+ let Latency = 2;
+} // Vector stores.
+
+def : WriteRes<WriteAtomic, []> { let Unsupported = 1; }
+
+def : WriteRes<WriteSys, []> { let Latency = 1; }
+def : WriteRes<WriteBarrier, []> { let Latency = 1; }
+def : WriteRes<WriteHint, []> { let Latency = 1; }
+
+def : WriteRes<WriteLDHi, []> {
+ let Latency = 3;
+} // The second register of a load-pair: LDP,LDPSW,LDNP,LDXP,LDAXP
+
+// Forwarding logic.
+def : ReadAdvance<ReadI, 0>;
+def : ReadAdvance<ReadISReg, 0>;
+def : ReadAdvance<ReadIEReg, 0>;
+def : ReadAdvance<ReadIM, 0>;
+def : ReadAdvance<ReadIMA, 1, [WriteIM32, WriteIM64]>;
+def : ReadAdvance<ReadID, 0>;
+def : ReadAdvance<ReadExtrHi, 0>;
+def : ReadAdvance<ReadST, 0>;
+def : ReadAdvance<ReadAdrBase, 0>;
+def : ReadAdvance<ReadVLD, 0>;
+
+//===----------------------------------------------------------------------===//
+// Specialising the scheduling model further for Ampere-1B.
+
+def : InstRW<[Ampere1BWrite_1cyc_1AB], (instrs COPY)>;
+
+// Branch instructions
+def : InstRW<[Ampere1BWrite_1cyc_1A], (instrs Bcc, BL, RET)>;
+def : InstRW<[Ampere1BWrite_1cyc_1A],
+ (instrs CBZW, CBZX, CBNZW, CBNZX, TBZW...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/81341
More information about the llvm-commits
mailing list