[PATCH] D22456: [X86][SSE] Add cost model values for CTPOP of vectors

Mon Jul 18 14:46:59 PDT 2016

silvas added a comment.

Is the plan to make these costs also dependent on host CPU? For example, IIRC the vector ctpop lowerings have serially dependent pshufb's which are 1 cycle latency on big intel cores but 4 cycle latency on Jaguar according to Agner.
Also, on Jaguar scalar popcnt is "as cheap as an add" but on e.g. Skylake scalar popcnt has 4x less throughput than an add and 3x higher latency.

Repository:
  rL LLVM

https://reviews.llvm.org/D22456