[PATCH] D36663: [X86][Haswell] Updating HSW instruction scheduling information

Mon Aug 21 01:04:30 PDT 2017

gadi.haber added inline comments.

================
Comment at: lib/Target/X86/X86SchedHaswell.td:2722
+def HWWriteResGroup52 : SchedWriteRes<[HWPort1,HWPort23]> {
+  let Latency = 3;
+  let NumMicroOps = 2;
----------------
craig.topper wrote:
> gadi.haber wrote:
> > craig.topper wrote:
> > > gadi.haber wrote:
> > > > craig.topper wrote:
> > > > > Should this account for load latency?
> > > > yes, according to the SNB architects.
> > > If it shoudl include load latency shouldn't it have a latency of more than 3? ADDPDrr is in a group with latency 3. So shoudln't ADDPrm be more than 3?
> > The scheduling model is based on the fact that there are no memory latencies effects, i.e., no cache misses and everything is in the 1st level cache.
> > This is the model successfully used and constantly verified by the architects.
> > The performance measurements we ran support this model.
> > 
> I understand assuming everyting is in L1.
> 
> But in the SandyBridge model you have ADDPDrr as 3 cycles and ADDPDrm as 9 cycles. So it seems you're accounting for the load as being 6 cycles. But in Haswell you have both as 3 cycles. So loads from the L1 are free on Haswell?
> 
What I understood from the architects who explained it to me is that the memory access in SNB required additional cycles even when everything is in  L1.
The exact additional cycles depends on the instruction's ucode.
As a result there are memory instructions that require less additional cycles than others. For example: MOV(16|32|64)rr requires 1 cycle whereas MOV(16|32|64)rm requires 5 cycles.
In here he difference is 4 cycles (not additional 6 as in ADDPD).

Repository:
  rL LLVM

https://reviews.llvm.org/D36663