[PATCH] D94395: [X86] AMD Zen 3 Scheduler Model

Fri Apr 30 11:37:53 PDT 2021

lebedev.ri added inline comments.

================
Comment at: llvm/lib/Target/X86/X86ScheduleZnver3.td:29
+  // The maximum capacity of the op cache is 4K ops.
+  let MicroOpBufferSize = 4096;
+  // Agner, 22.5 µop cache
----------------
GGanesh wrote:
> Ain't this based on the retire queue or reorder buffer? Its mapped to retire control unit in the MCA hardware units. So, I guess we should use the retire queue size rather than the micro tags\micro op cache here. This opcache doesn't go through the regular decode path. They are kind replay units most suited for loops. Or we should map MicroOpBufferSize with the RCU.
Err, right, i may have forgotten to clean this up.

================
Comment at: llvm/lib/Target/X86/X86ScheduleZnver3.td:32
+  // The size of the µop cache is big enough for holding most critical loops.
+  let LoopMicroOpBufferSize = MicroOpBufferSize;
+  // AMD SOG 19h, 2.6.2 L1 Data Cache
----------------
GGanesh wrote:
> I think we should define LoopMicroOpBufferSize with 4096 as they are best suited for loops and not for insns in the legacy decode path.
I'm not sure i follow. You are asking to spell out `4096` directly?

================
Comment at: llvm/lib/Target/X86/X86ScheduleZnver3.td:46
+  // FIXME
+  let HighLatency = 25; // FIXME: any better choice?
+  // AMD SOG 19h, 2.8 Optimizing Branching
----------------
GGanesh wrote:
> This high latency is mostly as per the div unit. Can we use the arithmetic instruction high latency instead? I had this doubt when I wanted to enable but went ahead with the highest latency number that I could see. It gets used in the InstrInfo So, I think we can restrict it to only DIV class of instructions. I am okay if you decide to go with 25 here.
I'm indeed not really sure what value should be there,
and more importantly i'm not sure basing it purely off latency is the right choice.
I think it should ideally also consider all macrocoded (uops>2) instructions.
I think i want to leave this as-is for now.

================
Comment at: llvm/lib/Target/X86/X86ScheduleZnver3.td:162
+// each of its associated pipelines
+// FIXME: these are 4 separate schedulers, not a single big one.
+def Zn3Int : ProcResGroup<[Zn3ALU0, Zn3AGU0, Zn3BRU0, // scheduler 0
----------------
GGanesh wrote:
> Yes, we need to model how these schedulers function. Let us get this forward for this iteration of the changes with FIXME.
> 
To be noted, this is trivial to do, but llvm-mca bugs need to be fixed first.

================
Comment at: llvm/lib/Target/X86/X86ScheduleZnver3.td:371
+// AMD SOG 19h, 2.12 Load-Store Unit
+// A maximum of two of the memory operations can be stores.
+let Super = Zn3LSU in
----------------
GGanesh wrote:
> Is there a way we can model some of these dispatch restrictions using the execution units or with other units? I initially had some plans to do it but was not really aware\sure how to do it.
> We haven't really modeled such restrictions. With issue width I think we aren't really assuring these dispatch restrictions. I think modeling the scheduler functionality as mentioned in the earlier comment will do!
Note that this is not a FIXME. Nothing to be done about this,
i have already modelled this correctly in this patch,
it's pretty trivial actually.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94395/new/

https://reviews.llvm.org/D94395