[PATCH] try to lowerVectorShuffleAsElementInsertion() for all 256-bit vector sub-types [X86, AVX]

Mon Mar 30 10:22:50 PDT 2015

================
Comment at: test/CodeGen/X86/vector-shuffle-256-v4.ll:830
@@ -833,5 +829,3 @@
 ; AVX1:       # BB#0:
-; AVX1-NEXT:    vmovq {{.*#+}} xmm0 = mem[0],zero
-; AVX1-NEXT:    vxorpd %ymm1, %ymm1, %ymm1
-; AVX1-NEXT:    vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]
+; AVX1-NEXT:    vmovsd {{.*#+}} xmm0 = mem[0],zero
 ; AVX1-NEXT:    retq
----------------
andreadb wrote:
> So, this is what you meant when you said that we don't get the correct fp/int domain.
> In X86InstrSSE.td we have patterns like this:
> ```
>   def : Pat<(v4i64 (X86vzmovl (insert_subvector undef, (v2i64 (scalar_to_vector (loadi64 addr:$src))), (iPTR 0)))), (SUBREG_TO_REG (i32 0), (VMOVSDrm addr:$src), sub_xmm)>;
> ```
> Do you plan to send a follow-up patch to fix tablegen patterns so that VMOVQI2PQIrm is used instead of VMOVSDrm for the integer domain?. If that's the case, then it makes sense to commit this patch first and fix the fp/int domain issue in a separate patch.
Hi Andrea -

That's correct. I saw a couple of places where we didn't have the right tablegen patterns. And I had a patch for it somewhere...but I'm not finding it now. But it was just simple replacements to substitute the right type like what you have noted here.

================
Comment at: test/CodeGen/X86/vector-shuffle-256-v8.ll:134-137
@@ -133,6 +133,6 @@
 ; AVX2:       # BB#0:
 ; AVX2-NEXT:    movl $7, %eax
 ; AVX2-NEXT:    vmovd %eax, %xmm1
-; AVX2-NEXT:    vpxor %ymm2, %ymm2, %ymm2
-; AVX2-NEXT:    vpblendd {{.*#+}} ymm1 = ymm1[0],ymm2[1,2,3,4,5,6,7]
+; AVX2-NEXT:    vxorps %ymm2, %ymm2, %ymm2
+; AVX2-NEXT:    vblendps {{.*#+}} ymm1 = ymm1[0],ymm2[1,2,3,4,5,6,7]
 ; AVX2-NEXT:    vpermps %ymm0, %ymm1, %ymm0
----------------
andreadb wrote:
> This has nothing to do with your patch, however, I am surprised that we get this long sequence of instructions on AVX2 instead of just a single 'vmovaps' plus 'vpermd'.
> Here, %ymm1 is used to store the 'vpermd' permute mask. That mask is basically known at compile time (it is vector <7,0,0,0,0,0,0,0>) so, we could just have a load from constant pool instead of computing the mask at runtime. I think we could replace this entire sequence with a load from constant pool followed by a 'vpermd'.
Interesting - it's not entirely unrelated because the permute mask itself could be viewed as a zero-extended vector, right? I've filed this as:
https://llvm.org/bugs/show_bug.cgi?id=23073

http://reviews.llvm.org/D8341

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/