[PATCH] D19228: [X86][AVX2] Prefer VPERMQ/VPERMPD over VPERM2I128/VPERM2F128 for unary shuffles

Mon Apr 18 11:08:14 PDT 2016

spatel added a comment.




================
Comment at: lib/Target/X86/X86ISelLowering.cpp:10584
@@ -10581,3 +10583,3 @@
 
-  // If either input operand is a zero vector, use VPERM2X128 because its mask
-  // allows us to replace the zero input with an implicit zero.
+  // With AVX2 we should use VPERMQ/VPERMPD to allow memory folding.
+  if (Subtarget.hasAVX2() && isSingleInputShuffleMask(Mask) && !IsV1Zero)
----------------
I may have missed it, but the advantage shown in the test changes is just that we get to use an instruction with a single input operand. Add a test to show the load folding win?


Repository:
  rL LLVM

http://reviews.llvm.org/D19228