[PATCH] D39802: Sched model improving on btver2: JFPU01 resource, vtestp* for xmm.

Thu Nov 9 00:20:29 PST 2017

avt77 added inline comments.

================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:202
   let Latency = 2;
-  let ResourceCycles = [2];
+  let ResourceCycles = [4];
 }
----------------
andreadb wrote:
> andreadb wrote:
> > `let NumMicroOps = 3;`
> My understanding is that ResourceCycles has to be 4 because you want to be able to compute a reciprocal throughput of 2. According to the amd documents, the pipes used by a float variable blend are "FPA|FPM".
> 
> I wonder whether we could have [JFPU0, JFPU1] instead of [JFPU01], and then change ResourceCycles to [2, 1].
> 
> A variable blend is 3 uOps, and internally, it is likely to be implemented as the sequence {VAND,VANDN,VOR}, where VAND and VANDN can execute in parallel.
> 
> It may be worthy to run some experiments to see which approach is better. That being said, your approach is not wrong, and I don't have a strong opinion on this.
I have AMD laptop to make experiments but I don't have any perf test using a variable blend. Could anyone to help me with such a test?

https://reviews.llvm.org/D39802