[PATCH] D47676: [X86][Znver1] Specify Register Files, RCU; FP scheduler capacity.

Tue Jun 19 04:06:00 PDT 2018

lebedev.ri added inline comments.

================
Comment at: lib/Target/X86/X86ScheduleZnver1.td:111
+// Reference: "Software Optimization Guide for AMD Family 17h Processors"
+def ZnRCU : RetireControlUnit<192, 8>;
+
----------------
andreadb wrote:
> lebedev.ri wrote:
> > andreadb wrote:
> > > lebedev.ri wrote:
> > > > andreadb wrote:
> > > > > lebedev.ri wrote:
> > > > > > GGanesh wrote:
> > > > > > > The retire unit is shared between integer and FP ops. In SMT mode it is 96 entry per thread. So, I think we shall consider only 96 entry as a conservative value.
> > > > > > Aha, i was wondering how SMT was considered here.
> > > > > > But then what about `MicroOpBufferSize` in `SchedMachineModel`?
> > > > > > Is that supposed to be keep at `192`?
> > > > > llvm-mca doesn't make assumptions on whether the CPU is in SMT mode or not. Same for the scheduling model, which assumes an optimistic micro-op buffer. For now, it is better to specify the resources seen available by a single thread running on the CPU (i.e. with no other concurrent threads).
> > > > > Basically, I think you should not go for a conservative value here.
> > > > Hmm. Is that documented somewhere?
> > > > I would have naively expected that what @GGanesh wrote is the approach..
> > > This limitation was mentioned in the RFC.
> > > 
> > > You cannot possibly make any reasonable assumptions with SMT. The problem is not just the reorder buffer but any other resources which may or may not be competitively shared (or statically/dynamically partitioned).
> > > Even if we know that the cpu is multi threaded, there is no way to predict how the other thread will use/consume hardware resources.
> > Hmm.
> > https://support.amd.com/TechDocs/55723_SOG_Fam_17h_Processors_3.00.pdf
> > Page 35:
> > `The retire queue can hold up to 192 micro ops or 96 per thread in SMT mode.`
> You shouldn't be making assumptions on SMT and use a conservative value for the retire queue. See my previous comment.
> Without a proper framework to emulate SMT in llvm-mca, It is better to keep this value to the theoretical maximum. At least, you will get more accurate numbers when there is only a single thread active on the cpu.
> You shouldn't be making assumptions on SMT 
Ah, on the SMT itself too, not //only// on what happens in SMT mode. I missed that remark initially.
Does the scheduling use `RetireControlUnit<>` values, or is it only used in mca right now?

Repository:
  rL LLVM

https://reviews.llvm.org/D47676