[PATCH] D19228: [X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary shuffles
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 18 14:49:27 PDT 2016
RKSimon added inline comments.
================
Comment at: test/CodeGen/X86/avx-vperm2x128.ll:65-66
@@ +64,4 @@
+; AVX1: ## BB#0: ## %entry
+; AVX1-NEXT: vmovaps (%rdi), %ymm0
+; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
+; AVX1-NEXT: retq
----------------
spatel wrote:
> So this one could be 'vperm2f128' with a memop, couldn't it? Any idea why that didn't happen?
The insertf128 pattern is used instead for cases where we're inserting the lower half (so no extract) and the other half is already in place - this is the better thing to do on pre-AVX2 targets according to Agner's lists (especially on AMD targets which is weak on 128-bit lane crossings).
Fixing this in the memory fold code would be tricky as the folding logic will see the input split into 2 and will assume it can't be folded so it'll never arrive at foldMemoryOperandImpl.
Repository:
rL LLVM
http://reviews.llvm.org/D19228
More information about the llvm-commits
mailing list