[PATCH] D39802: Sched model improving on btver2: JFPU01 resource, vtestp* for xmm.
Andrew V. Tischenko via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 9 00:20:29 PST 2017
avt77 added inline comments.
================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:202
let Latency = 2;
- let ResourceCycles = [2];
+ let ResourceCycles = [4];
}
----------------
andreadb wrote:
> andreadb wrote:
> > `let NumMicroOps = 3;`
> My understanding is that ResourceCycles has to be 4 because you want to be able to compute a reciprocal throughput of 2. According to the amd documents, the pipes used by a float variable blend are "FPA|FPM".
>
> I wonder whether we could have [JFPU0, JFPU1] instead of [JFPU01], and then change ResourceCycles to [2, 1].
>
> A variable blend is 3 uOps, and internally, it is likely to be implemented as the sequence {VAND,VANDN,VOR}, where VAND and VANDN can execute in parallel.
>
> It may be worthy to run some experiments to see which approach is better. That being said, your approach is not wrong, and I don't have a strong opinion on this.
I have AMD laptop to make experiments but I don't have any perf test using a variable blend. Could anyone to help me with such a test?
https://reviews.llvm.org/D39802
More information about the llvm-commits
mailing list