[PATCH] D94395: [X86] AMD Zen 3 Scheduler Model

Fri Apr 30 08:56:01 PDT 2021

GGanesh added a comment.

In D94395#2728674 <https://reviews.llvm.org/D94395#2728674>, @lebedev.ri wrote:

> @GGanesh i plan on landing this next monday, may 3'rd, in 3 days, unless some blocking feedback is provided.

I am fine mostly. Will give it another look and check the details.

================
Comment at: llvm/lib/Target/X86/X86ScheduleZnver3.td:29
+  // The maximum capacity of the op cache is 4K ops.
+  let MicroOpBufferSize = 4096;
+  // Agner, 22.5 µop cache
----------------
Ain't this based on the retire queue or reorder buffer? Its mapped to retire control unit in the MCA hardware units. So, I guess we should use the retire queue size rather than the micro tags\micro op cache here. This opcache doesn't go through the regular decode path. They are kind replay units most suited for loops. Or we should map MicroOpBufferSize with the RCU.

================
Comment at: llvm/lib/Target/X86/X86ScheduleZnver3.td:32
+  // The size of the µop cache is big enough for holding most critical loops.
+  let LoopMicroOpBufferSize = MicroOpBufferSize;
+  // AMD SOG 19h, 2.6.2 L1 Data Cache
----------------
I think we should define LoopMicroOpBufferSize with 4096 as they are best suited for loops and not for insns in the legacy decode path.

================
Comment at: llvm/lib/Target/X86/X86ScheduleZnver3.td:46
+  // FIXME
+  let HighLatency = 25; // FIXME: any better choice?
+  // AMD SOG 19h, 2.8 Optimizing Branching
----------------
This high latency is mostly as per the div unit. Can we use the arithmetic instruction high latency instead? I had this doubt when I wanted to enable but went ahead with the highest latency number that I could see. It gets used in the InstrInfo So, I think we can restrict it to only DIV class of instructions. I am okay if you decide to go with 25 here.

================
Comment at: llvm/lib/Target/X86/X86ScheduleZnver3.td:162
+// each of its associated pipelines
+// FIXME: these are 4 separate schedulers, not a single big one.
+def Zn3Int : ProcResGroup<[Zn3ALU0, Zn3AGU0, Zn3BRU0, // scheduler 0
----------------
Yes, we need to model how these schedulers function. Let us get this forward for this iteration of the changes with FIXME.

================
Comment at: llvm/lib/Target/X86/X86ScheduleZnver3.td:371
+// AMD SOG 19h, 2.12 Load-Store Unit
+// A maximum of two of the memory operations can be stores.
+let Super = Zn3LSU in
----------------
Is there a way we can model some of these dispatch restrictions using the execution units or with other units? I initially had some plans to do it but was not really aware\sure how to do it.
We haven't really modeled such restrictions. With issue width I think we aren't really assuring these dispatch restrictions. I think modeling the scheduler functionality as mentioned in the earlier comment will do!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94395/new/

https://reviews.llvm.org/D94395