[PATCH] D32219: [X86][SSE] Improve DIV/SQRT throughput estimates for SB/HW schedule models

Thu Apr 27 08:55:40 PDT 2017

RKSimon added inline comments.

================
Comment at: lib/Target/X86/X86SchedHaswell.td:140
+def : WriteRes<WriteFDiv, [HWPort0]> {
+  let Latency = 12; // 10-14 cycles.
+  let ResourceCycles = [12];
----------------
gadi.haber wrote:
> instruction latency of X87 FDIV in Haswell is actually higher and takes 20 cycles
Despite its name this scheduling class is also used by the SSE/AVX float double division (just the xmm variants here as the ymm are overridden). Given that we barely use x87 these days aren't we better off using the value just for SSE/AVX?

================
Comment at: lib/Target/X86/X86SchedHaswell.td:145
+def : WriteRes<WriteFDivLd, [HWPort23, HWPort0]> {
+  let Latency = 16; // load + 10-14 cycles.
+  let ResourceCycles = [1, 12];
----------------
gadi.haber wrote:
> latency of FDIVLd in Haswell is 24
Then why is load latency in HWWriteResPair just 4 cycles? 

================
Comment at: lib/Target/X86/X86SchedHaswell.td:151
+def : WriteRes<WriteFSqrt, [HWPort0]> {
+  let Latency = 15;
+  let ResourceCycles = [15];
----------------
gadi.haber wrote:
> latency of FSqrt in Haswell is 23
Please can you cite the source of these numbers? I've been careful not to change the current latency values (as shown in the diffs in the tests below) and am just trying to add more realistic throughput values.

================
Comment at: lib/Target/X86/X86SchedHaswell.td:1929
   let NumMicroOps = 3;
-  let ResourceCycles = [2, 1];
+  let ResourceCycles = [2, 19];
 }
----------------
gadi.haber wrote:
> ResourceCycles should be [2, 1]
> 
> ResourceCycles lists the number of times where HW port was used in the instruction.
> In this case HWPort0 is used twice (by uOp1 and uOp2) and HWPort015 is used only once (by uOp3)
I don't think I agree. ResourceCycles is an analogue for throughput here - the number of cycles that the op consumes this resource for in that stage. It should be 12 (ish) cycles to indicate that HWPort0 won't accept instructions for 12 cycles while it completes the division. 

Repository:
  rL LLVM

https://reviews.llvm.org/D32219