[PATCH] D47676: [X86][Znver1] Specify Register Files, RCU; FP scheduler capacity.

Tue Jun 19 05:12:02 PDT 2018

lebedev.ri added inline comments.

================
Comment at: lib/Target/X86/X86ScheduleZnver1.td:111
+// Reference: "Software Optimization Guide for AMD Family 17h Processors"
+def ZnRCU : RetireControlUnit<192, 8>;
+
----------------
andreadb wrote:
> lebedev.ri wrote:
> > andreadb wrote:
> > > lebedev.ri wrote:
> > > > andreadb wrote:
> > > > > lebedev.ri wrote:
> > > > > > andreadb wrote:
> > > > > > > lebedev.ri wrote:
> > > > > > > > GGanesh wrote:
> > > > > > > > > The retire unit is shared between integer and FP ops. In SMT mode it is 96 entry per thread. So, I think we shall consider only 96 entry as a conservative value.
> > > > > > > > Aha, i was wondering how SMT was considered here.
> > > > > > > > But then what about `MicroOpBufferSize` in `SchedMachineModel`?
> > > > > > > > Is that supposed to be keep at `192`?
> > > > > > > llvm-mca doesn't make assumptions on whether the CPU is in SMT mode or not. Same for the scheduling model, which assumes an optimistic micro-op buffer. For now, it is better to specify the resources seen available by a single thread running on the CPU (i.e. with no other concurrent threads).
> > > > > > > Basically, I think you should not go for a conservative value here.
> > > > > > Hmm. Is that documented somewhere?
> > > > > > I would have naively expected that what @GGanesh wrote is the approach..
> > > > > This limitation was mentioned in the RFC.
> > > > > 
> > > > > You cannot possibly make any reasonable assumptions with SMT. The problem is not just the reorder buffer but any other resources which may or may not be competitively shared (or statically/dynamically partitioned).
> > > > > Even if we know that the cpu is multi threaded, there is no way to predict how the other thread will use/consume hardware resources.
> > > > Hmm.
> > > > https://support.amd.com/TechDocs/55723_SOG_Fam_17h_Processors_3.00.pdf
> > > > Page 35:
> > > > `The retire queue can hold up to 192 micro ops or 96 per thread in SMT mode.`
> > > You shouldn't be making assumptions on SMT and use a conservative value for the retire queue. See my previous comment.
> > > Without a proper framework to emulate SMT in llvm-mca, It is better to keep this value to the theoretical maximum. At least, you will get more accurate numbers when there is only a single thread active on the cpu.
> > > You shouldn't be making assumptions on SMT 
> > Ah, on the SMT itself too, not //only// on what happens in SMT mode. I missed that remark initially.
> > Does the scheduling use `RetireControlUnit<>` values, or is it only used in mca right now?
> My point is that in an SMT processor, resources (not just the ROB) are shared by multiple hardware threads.
> Some resources may be competitively shared among the threads. Not knowing what the other thread is doing, makes it impossible to correctly predict how buffers are used. 
> 
> By setting the RCU size to half of its capacity, you are assuming that the processor is conservatively running in SMT mode. However, you are mixing apples with oranges here. You are not saying how other resources should behave in SMT mode. What about the schedulers, register file, dispatch logic in SMT mode?
> 
> My point is: there is no framework at the moment to deal with SMT in general. So, there is no advantage in using a conservative value for the RCU here (unless you think it better matches your perf results).
> 
> To answer to your last question:
> for now, the RetireControlUnit information is only used by mca.
Thank you for this detailed explanation!

> To answer to your last question:
for now, the RetireControlUnit information is only used by mca.

Ok then!

Repository:
  rL LLVM

https://reviews.llvm.org/D47676