[PATCH] D47676: [X86][Znver1] Specify Register Files, RCU; FP scheduler capacity.
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jun 15 07:23:36 PDT 2018
lebedev.ri added inline comments.
================
Comment at: lib/Target/X86/X86ScheduleZnver1.td:96
+// Reference: "Software Optimization Guide for AMD Family 17h Processors"
+def ZnIntegerPRF : RegisterFile<168, [GR8, GR16, GR32, GR64, CCR]>;
+
----------------
andreadb wrote:
> GGanesh wrote:
> > I am not sure why we include CCR register.
> As a rule of thumb:
> a register class should only be added if you want to allow those registers to be renamed. In this case, CCR should be added if you want to count EFLAGS renames agains the number of available physical registers in the PRF.
> If you don't think EFLAGS renames should be counted against the total, then it is correct to remove it. Otherwise, it is incorrect.
I'm not seeing anything in docs, so i'm going to be conservative, and add it back.
================
Comment at: lib/Target/X86/X86ScheduleZnver1.td:111
+// Reference: "Software Optimization Guide for AMD Family 17h Processors"
+def ZnRCU : RetireControlUnit<192, 8>;
+
----------------
andreadb wrote:
> lebedev.ri wrote:
> > GGanesh wrote:
> > > The retire unit is shared between integer and FP ops. In SMT mode it is 96 entry per thread. So, I think we shall consider only 96 entry as a conservative value.
> > Aha, i was wondering how SMT was considered here.
> > But then what about `MicroOpBufferSize` in `SchedMachineModel`?
> > Is that supposed to be keep at `192`?
> llvm-mca doesn't make assumptions on whether the CPU is in SMT mode or not. Same for the scheduling model, which assumes an optimistic micro-op buffer. For now, it is better to specify the resources seen available by a single thread running on the CPU (i.e. with no other concurrent threads).
> Basically, I think you should not go for a conservative value here.
Hmm. Is that documented somewhere?
I would have naively expected that what @GGanesh wrote is the approach..
================
Comment at: lib/Target/X86/X86ScheduleZnver1.td:113
+
+// FIXME: there are 72 read buffers and 44 write buffers.
+
----------------
andreadb wrote:
> lebedev.ri wrote:
> > GGanesh wrote:
> > > I assume these are the load/store queue entries. The FPU has
> > > 1. 44 entry Load Queue
> > > 2. 72 Out of Order Loads
> > > 3. 44 entry Store Queue
> > > So, we are concerned only about the queues, we can have only 44 marked for LD and ST.
> > >
> > I think i would prefer to keep this just as a fixme note for now, since
> > while llvm-mca does have an options to specify the sizes of these queues,
> > it does not seem to read those from the sched model otherwise,
> > i guess because it can't currently be expressed here.
> In theory, the processor model could specify load/store queues as a buffered processor resources that are consumed by instructions.
> In practice, llvm-mca has the concept of `LSUnit`. There is an open bugzilla about how to better describe load/store queues in llvm-mca and the scheduling model.
(That matches what i said)
Repository:
rL LLVM
https://reviews.llvm.org/D47676
More information about the llvm-commits
mailing list