[PATCH] D41794: [X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation

Tue Jul 31 11:55:03 PDT 2018

craig.topper added inline comments.

================
Comment at: test/CodeGen/X86/prefer-avx256-mask-shuffle.ll:210
 ; AVX256VLBW-NEXT:    kmovd %eax, %k1
-; AVX256VLBW-NEXT:    vpshufb {{.*#+}} ymm0 {%k1} = ymm1[u,u,6,u,u,u,u,u,u,u,u,u,u,5,u,u,19,22,u,28,19,23,23,16,19,22,17,29,19,u,23,16]
+; AVX256VLBW-NEXT:    vmovdqu8 %ymm2, %ymm0 {%k1}
 ; AVX256VLBW-NEXT:    vpmovb2m %ymm0, %k0
----------------
craig.topper wrote:
> RKSimon wrote:
> > This is the only notable regression - any idea why it breaks so badly?
> It looks like we go through lowerVectorShuffleAsLanePermuteAndBlend which makes the unary shuffle non-unary. Then we go through lowerVectorShuffleByMerging128BitLanes which creates a repeated mask. But we still weren't able to handle this repeated mask cleanly so we end up shuffling and blending.
Why can't shuffle combining merge the two vblendds with the vpermq to create two new vpermqs? Is it because the vpermq is used twice or the fact that vblendd is v8i32 and vpermq is v4i64? Or something else?

Repository:
  rL LLVM

https://reviews.llvm.org/D41794