[PATCH] try to lowerVectorShuffleAsElementInsertion() for all 256-bit vector sub-types [X86, AVX]

Mon Mar 30 10:41:27 PDT 2015

================
Comment at: test/CodeGen/X86/vector-shuffle-256-v8.ll:134-137
@@ -133,6 +133,6 @@
 ; AVX2:       # BB#0:
 ; AVX2-NEXT:    movl $7, %eax
 ; AVX2-NEXT:    vmovd %eax, %xmm1
-; AVX2-NEXT:    vpxor %ymm2, %ymm2, %ymm2
-; AVX2-NEXT:    vpblendd {{.*#+}} ymm1 = ymm1[0],ymm2[1,2,3,4,5,6,7]
+; AVX2-NEXT:    vxorps %ymm2, %ymm2, %ymm2
+; AVX2-NEXT:    vblendps {{.*#+}} ymm1 = ymm1[0],ymm2[1,2,3,4,5,6,7]
 ; AVX2-NEXT:    vpermps %ymm0, %ymm1, %ymm0
----------------
spatel wrote:
> andreadb wrote:
> > This has nothing to do with your patch, however, I am surprised that we get this long sequence of instructions on AVX2 instead of just a single 'vmovaps' plus 'vpermd'.
> > Here, %ymm1 is used to store the 'vpermd' permute mask. That mask is basically known at compile time (it is vector <7,0,0,0,0,0,0,0>) so, we could just have a load from constant pool instead of computing the mask at runtime. I think we could replace this entire sequence with a load from constant pool followed by a 'vpermd'.
> Interesting - it's not entirely unrelated because the permute mask itself could be viewed as a zero-extended vector, right? I've filed this as:
> https://llvm.org/bugs/show_bug.cgi?id=23073
Right,
```
  movl $7, %eax
  vmovd %eax, %xmm1
  vxorps %ymm2, %ymm2, %ymm2
  vblendps {{.*#+}} ymm1 = ymm1[0],ymm2[1,2,3,4,5,6,7]
```
is basically equivalent to:
```
  movl $7, %eax
  vmovd %eax, %xmm1
```

Bits [VLMAX-1:32] would be implicitly zeroed.

http://reviews.llvm.org/D8341

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/