[PATCH] D151894: [AArch64] Neoverse V2 scheduling model

Fri Jun 2 00:11:44 PDT 2023

dmgreen added a comment.

Thanks for working on this, it looks like a good patch. I looked through some of the details and had a few questions.

It might make sense to use this for some other cpus like the cortex-x3, but they have slightly different pipelines, sitting between the N2 and V2 in the number of units. For now we can keep them as-is.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrFormats.td:10865

-  def i16_indexed : BaseSIMDIndexedTied<1, U, 1, 0b01, opc,
-                                        FPR16Op, FPR16Op, V128_lo,
-                                        VectorIndexH, asm, ".h", "", "", ".h",
-                                        []> {
+  def v1i16_indexed : BaseSIMDIndexedTied<1, U, 1, 0b01, opc,
+                                          FPR16Op, FPR16Op, V128_lo,
----------------
Can you do this in a separate patch, in case it causes problems.

================
Comment at: llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td:202
+  let NumMicroOps = 2;
+  let ResourceCycles = [1, 3];  // LDPSW
+}
----------------
Should this use the load unit for 3 ResourceCycles, as opposed to being pipelined?

================
Comment at: llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td:896
+
+def V2Write_LdrHQ : SchedWriteVariant<[
+                      SchedVar<NeoverseHQForm,  [V2Write_7cyc_1I_1L]>,
----------------
Can you explain where the differences between h/q and the other sizes come from?

================
Comment at: llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td:1026-1027
+// SDIV, UDIV
+def : SchedAlias<WriteID32,  V2Write_12cyc_1M0>;
+def : SchedAlias<WriteID64,  V2Write_20cyc_1M0>;
+
----------------
12 and 20 are worst-case times. Would a value more in the middle of the range be better?

================
Comment at: llvm/lib/Target/AArch64/AArch64SchedNeoverseV2.td:1035
+// Multiply long
+// NOTE: SOG p. 16, n. 2: How to specify late-forwarding between similar ops?
+def : InstRW<[V2Write_Mul], (instregex "^M(ADD|SUB)[WX]rrr$")>;
----------------
It is usually done with read advances.

================
Comment at: llvm/test/tools/llvm-mca/AArch64/Neoverse/V2-neon-instructions.s:399
+fsub v0.2s, v0.2s, v0.2s
+ld1 { v0.16b }, [x0]
+ld1 { v0.2d, v1.2d, v2.2d }, [x0], #48
----------------
Add more ldr tests perhaps.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D151894/new/

https://reviews.llvm.org/D151894