[PATCH] D39802: Sched model improving on btver2: JFPU01 resource, vtestp* for xmm.

Wed Nov 8 10:24:17 PST 2017

andreadb added inline comments.

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:202
   let Latency = 2;
-  let ResourceCycles = [2];
+  let ResourceCycles = [4];
 }
----------------
andreadb wrote:
> `let NumMicroOps = 3;`
My understanding is that ResourceCycles has to be 4 because you want to be able to compute a reciprocal throughput of 2. According to the amd documents, the pipes used by a float variable blend are "FPA|FPM".

I wonder whether we could have [JFPU0, JFPU1] instead of [JFPU01], and then change ResourceCycles to [2, 1].

A variable blend is 3 uOps, and internally, it is likely to be implemented as the sequence {VAND,VANDN,VOR}, where VAND and VANDN can execute in parallel.

It may be worthy to run some experiments to see which approach is better. That being said, your approach is not wrong, and I don't have a strong opinion on this.

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:202-206
+  let ResourceCycles = [4];
 }
 def : WriteRes<WriteFVarBlendLd, [JLAGU, JFPU01]> {
   let Latency = 7;
+  let ResourceCycles = [1, 4];
----------------
`let NumMicroOps = 3;`

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:218-225
 def : WriteRes<WriteVarBlend, [JFPU01]> {
   let Latency = 2;
-  let ResourceCycles = [2];
+  let ResourceCycles = [4];
 }
 def : WriteRes<WriteVarBlendLd, [JLAGU, JFPU01]> {
   let Latency = 7;
+  let ResourceCycles = [1, 4];
----------------
Variable blend instructions are 3 macro ops.
You should add `let NumMicroOps = 3;`.

https://reviews.llvm.org/D39802