[PATCH] D151894: [AArch64] Neoverse V2 scheduling model
Ricardo Jesus via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 5 07:35:04 PDT 2023
rjj marked 2 inline comments as done and an inline comment as not done.
rjj added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:10865
- def i16_indexed : BaseSIMDIndexedTied<1, U, 1, 0b01, opc,
- FPR16Op, FPR16Op, V128_lo,
- VectorIndexH, asm, ".h", "", "", ".h",
- []> {
+ def v1i16_indexed : BaseSIMDIndexedTied<1, U, 1, 0b01, opc,
+ FPR16Op, FPR16Op, V128_lo,
----------------
dmgreen wrote:
> Can you do this in a separate patch, in case it causes problems.
Yep of course, done (D152161).
================
Comment at: llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td:202
+ let NumMicroOps = 2;
+ let ResourceCycles = [1, 3]; // LDPSW
+}
----------------
dmgreen wrote:
> Should this use the load unit for 3 ResourceCycles, as opposed to being pipelined?
You are right, changed to `SchedWriteRes<[V2UnitI, V2UnitL, V2UnitL, V2UnitL]>`.
================
Comment at: llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td:896
+
+def V2Write_LdrHQ : SchedWriteVariant<[
+ SchedVar<NeoverseHQForm, [V2Write_7cyc_1I_1L]>,
----------------
dmgreen wrote:
> Can you explain where the differences between h/q and the other sizes come from?
It's from the software optimisation guide, https://developer.arm.com/documentation/PJDOC-466751330-593177/r0p2 p. 24.
================
Comment at: llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td:985
+// ALU, basic, flagset
+def : SchedAlias<WriteI, V2Write_1cyc_1I>;
+
----------------
huntergr wrote:
> The flag setting variants use the 'F' pipelines rather than 'I'. The others do use 'I' though, so perhaps a predicate would work here.
Thanks, I've updated the model to use the 'F' pipelines in the cases you pointed out. Though I have a question: according to the SOG the throughput of these instructions is 3 instead of 4, even though there are 4 pipelines available. Do you have any idea why, or how we could accurately model this?
================
Comment at: llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td:1026-1027
+// SDIV, UDIV
+def : SchedAlias<WriteID32, V2Write_12cyc_1M0>;
+def : SchedAlias<WriteID64, V2Write_20cyc_1M0>;
+
----------------
dmgreen wrote:
> 12 and 20 are worst-case times. Would a value more in the middle of the range be better?
Sure, so maybe 8 and 12 respectively? Do you have a better suggestion? What about the throughput, 1/8 and 1/12?
================
Comment at: llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td:1035
+// Multiply long
+// NOTE: SOG p. 16, n. 2: How to specify late-forwarding between similar ops?
+def : InstRW<[V2Write_Mul], (instregex "^M(ADD|SUB)[WX]rrr$")>;
----------------
dmgreen wrote:
> It is usually done with read advances.
Thanks, I'll have a look. If you have any pointers to examples where read advances were used to model forwarding of instructions like `madd` and such, that would be greatly appreciated!
================
Comment at: llvm/test/tools/llvm-mca/AArch64/Neoverse/V2-neon-instructions.s:399
+fsub v0.2s, v0.2s, v0.2s
+ld1 { v0.16b }, [x0]
+ld1 { v0.2d, v1.2d, v2.2d }, [x0], #48
----------------
dmgreen wrote:
> Add more ldr tests perhaps.
I added a few more for H-form LDRs, but if you're referring to the FP loads they should be here already (you can grep for `ldr\s[hwxq]`).
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D151894/new/
https://reviews.llvm.org/D151894
More information about the llvm-commits
mailing list