[PATCH] D22456: [X86][SSE] Add cost model values for CTPOP of vectors
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Mon Jul 18 15:52:09 PDT 2016
RKSimon added a comment.
In https://reviews.llvm.org/D22456#487520, @silvas wrote:
> Is the plan to make these costs also dependent on host CPU? For example, IIRC the vector ctpop lowerings have serially dependent pshufb's which are 1 cycle latency on big intel cores but 4 cycle latency on Jaguar according to Agner.
Someday - I think the priority is to get down to one set of cost tables. Sanjay knows this better than I do but IIRC there are 3 or 4 separate sets of costs in the codebase - some based on approximate latency others (like this one) on throughput of recent big intel cores. I don't think any use the scheduler models or anything overly target specific.
> Also, on Jaguar scalar popcnt is "as cheap as an add" but on e.g. Skylake scalar popcnt has 4x less throughput than an add and 3x higher latency.
I haven't added scalar throughput costs here, the costs are for the vector implementations which as you say are dominated by PSHUFB - I was put off dealing with the scalars by TargetTransformInfo::PopcntSupportKind which seems to be trying to do something similar.
Repository:
rL LLVM
https://reviews.llvm.org/D22456
More information about the llvm-commits
mailing list