[PATCH] D40865: X86 AVX2: Prefer one VPERMV over ShuffleAsRepeatedMaskAndLanePermute

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 6 15:44:23 PST 2017


spatel added a reviewer: ddibyend.
spatel added a comment.

Is it correct that AVX1-only targets are never affected?

The implicit assumption is that the load of the mask could be hoisted far enough ahead that load latency doesn't get in the way, right? But then we're also increasing register pressure. Ideally, we'd do this later when we have some way to calculate the machine-specific trade-offs. Eg, Ryzen probably doesn't win with this transform because it needs 3 uops to do the vpermps. Do you have perf wins on benchmarks?


https://reviews.llvm.org/D40865





More information about the llvm-commits mailing list