[PATCH] D22456: [X86][SSE] Add cost model values for CTPOP of vectors

Sean Silva via llvm-commits llvm-commits at lists.llvm.org
Mon Jul 18 14:46:59 PDT 2016


silvas added a comment.

Is the plan to make these costs also dependent on host CPU? For example, IIRC the vector ctpop lowerings have serially dependent pshufb's which are 1 cycle latency on big intel cores but 4 cycle latency on Jaguar according to Agner.
Also, on Jaguar scalar popcnt is "as cheap as an add" but on e.g. Skylake scalar popcnt has 4x less throughput than an add and 3x higher latency.


Repository:
  rL LLVM

https://reviews.llvm.org/D22456





More information about the llvm-commits mailing list