[PATCH] D22456: [X86][SSE] Add cost model values for CTPOP of vectors

Mon Jul 18 15:52:09 PDT 2016

RKSimon added a comment.

In https://reviews.llvm.org/D22456#487520, @silvas wrote:

> Is the plan to make these costs also dependent on host CPU? For example, IIRC the vector ctpop lowerings have serially dependent pshufb's which are 1 cycle latency on big intel cores but 4 cycle latency on Jaguar according to Agner.

Someday - I think the priority is to get down to one set of cost tables. Sanjay knows this better than I do but IIRC there are 3 or 4 separate sets of costs in the codebase - some based on approximate latency others (like this one) on throughput of recent big intel cores. I don't think any use the scheduler models or anything overly target specific.

> Also, on Jaguar scalar popcnt is "as cheap as an add" but on e.g. Skylake scalar popcnt has 4x less throughput than an add and 3x higher latency.

I haven't added scalar throughput costs here, the costs are for the vector implementations which as you say are dominated by PSHUFB - I was put off dealing with the scalars by TargetTransformInfo::PopcntSupportKind which seems to be trying to do something similar.

Repository:
  rL LLVM

https://reviews.llvm.org/D22456