[PATCH] D152688: [Aarch64] Add Cortex-A510 specific scheduling

Mon Jun 12 05:30:29 PDT 2023

dmgreen added reviewers: rjj, evandro.
dmgreen added a comment.

Looks good. The CortexA510Write seems to work out well. I had a few comments/questions from looking through the software optimization guide.

Can you add a neon-instructions test file too? And update the sve test to include all the other instructions from the neoverse tests, which include more SVE2 instructions than the original A64FX test.

================
Comment at: llvm/lib/Target/AArch64/AArch64SchedA510.td:9-12
+// This file defines the machine model for the ARM Cortex-A510 processors. Note
+// that this schedule is currently used as the default for -mcpu=generic. As a
+// result, some of the modelling decision made do not precisely model the
+// Cortex-A510, instead aiming to be a good compromise between different cpus.
----------------
This comment can be updated.

================
Comment at: llvm/lib/Target/AArch64/AArch64SchedA510.td:80
+// MAC
+def : WriteRes<WriteIM32, [CortexA510UnitMAC]> { let Latency = 4; }   // 32-bit Multiply
+def : WriteRes<WriteIM64, [CortexA510UnitMAC]> { let Latency = 4; }   // 64-bit Multiply
----------------
Is 3 better here? Or is a better value to use for 32bit muls? I see 4 was used in the A55 model too.

================
Comment at: llvm/lib/Target/AArch64/AArch64SchedA510.td:122
+// Load
+def : WriteRes<WriteLD, [CortexA510UnitLd]> { let Latency = 2; }
+def : WriteRes<WriteLDIdx, [CortexA510UnitLd]> { let Latency = 2; }
----------------
2 sounds quite low. Is that better than using 3?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152688/new/

https://reviews.llvm.org/D152688