[PATCH] D40865: X86 AVX2: Prefer one VPERMV over ShuffleAsRepeatedMaskAndLanePermute

Wed Dec 6 15:44:23 PST 2017

spatel added a reviewer: ddibyend.
spatel added a comment.

Is it correct that AVX1-only targets are never affected?

The implicit assumption is that the load of the mask could be hoisted far enough ahead that load latency doesn't get in the way, right? But then we're also increasing register pressure. Ideally, we'd do this later when we have some way to calculate the machine-specific trade-offs. Eg, Ryzen probably doesn't win with this transform because it needs 3 uops to do the vpermps. Do you have perf wins on benchmarks?

https://reviews.llvm.org/D40865