[PATCH] D41794: [X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation

Tue Jul 31 11:49:15 PDT 2018

craig.topper added inline comments.

================
Comment at: test/CodeGen/X86/prefer-avx256-mask-shuffle.ll:210
 ; AVX256VLBW-NEXT:    kmovd %eax, %k1
-; AVX256VLBW-NEXT:    vpshufb {{.*#+}} ymm0 {%k1} = ymm1[u,u,6,u,u,u,u,u,u,u,u,u,u,5,u,u,19,22,u,28,19,23,23,16,19,22,17,29,19,u,23,16]
+; AVX256VLBW-NEXT:    vmovdqu8 %ymm2, %ymm0 {%k1}
 ; AVX256VLBW-NEXT:    vpmovb2m %ymm0, %k0
----------------
RKSimon wrote:
> This is the only notable regression - any idea why it breaks so badly?
It looks like we go through lowerVectorShuffleAsLanePermuteAndBlend which makes the unary shuffle non-unary. Then we go through lowerVectorShuffleByMerging128BitLanes which creates a repeated mask. But we still weren't able to handle this repeated mask cleanly so we end up shuffling and blending.

Repository:
  rL LLVM

https://reviews.llvm.org/D41794