[PATCH] D21148: [X86][SSE] Added support for combining target shuffles to (V)PSHUFD/VPERMILPD/VPERMILPS immediate permute

Wed Jun 22 15:12:05 PDT 2016

ab added a comment.

Code LGTM, but I'm not sure I have a clear picture;  questions inline.


================
Comment at: lib/Target/X86/X86ISelLowering.cpp:24546-24547
@@ -24541,3 +24545,4 @@
                                     unsigned &Shuffle, MVT &ShuffleVT) {
-  bool FloatDomain = SrcVT.isFloatingPoint();
+  bool FloatDomain = SrcVT.isFloatingPoint() ||
+                     (!Subtarget.hasAVX2() && SrcVT.is256BitVector());
 
----------------
This looks like independent goodness; maybe extract that out?

================
Comment at: test/CodeGen/X86/vector-shuffle-128-v2.ll:162
@@ +161,3 @@
+; AVX:       # BB#0:
+; AVX-NEXT:    vpermilpd {{.*#+}} xmm0 = xmm0[1,1]
+; AVX-NEXT:    retq
----------------
I'm surprised by this and other changes; isn't the combine for shuffle chains?  (it does look better for folding though; just trying to understand)

================
Comment at: test/CodeGen/X86/vector-shuffle-128-v4.ll:230
@@ +229,3 @@
+; AVX:       # BB#0:
+; AVX-NEXT:    vpermilps {{.*#+}} xmm0 = xmm0[0,0,1,1]
+; AVX-NEXT:    retq
----------------
In particular, this looks slightly more expensive according to Agner's Intel tables (for the folded variants)


Repository:
  rL LLVM

http://reviews.llvm.org/D21148