[PATCH] D152688: [Aarch64] Add Cortex-A510 specific scheduling

Wed Jun 14 07:53:23 PDT 2023

dmgreen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64SchedA510.td:59
+  def CortexA510UnitVMC    : ProcResource<1>;    // SIMD/FP/SVE multicycle instrs  (e.g Div, SQRT, cryptography)
+  }
+
----------------
Perhaps unindent this

================
Comment at: llvm/lib/Target/AArch64/AArch64SchedA510.td:67
+// (the software optimisation guide lists latencies taking into account
+// typical forwarding paths).
+def : WriteRes<WriteImm, [CortexA510UnitALU]> { let Latency = 1; }    // MOVN, MOVZ
----------------
harviniriawan wrote:
> evandro wrote:
> > I encourage you to model the forwarding paths as well, at least for those instructions whose throughput is 1 or less.
> I think it's best to let the hardware takes care of this, as it is internally able to do so
We haven't found the forwarding paths to be super useful in the past. They are difficult to model what is really going on in the core, and we have often found have caused more scheduling inaccuracies than they have helped with. The scheduler doesn't have a great way of modelling skewing.

================
Comment at: llvm/lib/Target/AArch64/AArch64SchedA510.td:663
+                        "^(SRH|SUQ|UQ|USQ|URH)ADD_ZPmZ_(UNDEF_)?[BHSD]$",
+                        "^(UQSUB|UQSUBR)_ZPmZ_(UNDEF_)?[BHSD]$")>;
+
----------------
rjj wrote:
> harviniriawan wrote:
> > rjj wrote:
> > > Same as above.
> > > 
> > > There are a few other places where the UNDEF patterns don't seem to exist, but before going through them, any reason why you've included these extra UNDEFs here?
> > I added them because I see from the debug output of machine-scheduler that sometimes it will generate pseudoinstructions with UNDEF* on some benchmarks, which then will pick up the wrong latency information without adding them into the regex. Maybe in the future we should make UNDEF and PSEUDO to always be at the end of the instruction for easier regex matching.
> Yep, I totally understand that- I was not suggesting their complete removal! It's just that in a few places the regex was matching "fictitious" UNDEFs, which personally I try to avoid.
> 
> I agree, in the future UNDEF and PSEUDO could go at the end of the instruction to facilitate their matching.
I don't have a strong opinion on the loose regexes, but it would be good to model the UNDEF nodes if we can, as those will be the ones seen during scheduling. SVE seemed to be a place where this scheduling model can have a decent benefit.

================
Comment at: llvm/lib/Target/AArch64/AArch64SchedA510.td:122
+// Load
+def : WriteRes<WriteLD, [CortexA510UnitLd]> { let Latency = 2; }
+def : WriteRes<WriteLDIdx, [CortexA510UnitLd]> { let Latency = 2; }
----------------
harviniriawan wrote:
> dmgreen wrote:
> > 2 sounds quite low. Is that better than using 3?
> SWOG indicates that integer loads have latency of 2
Oh yeah. I think I was looking at the wrong optimization guide at the time.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152688/new/

https://reviews.llvm.org/D152688