[PATCH] D41794: [X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation

Tue Jul 31 13:35:23 PDT 2018

RKSimon added inline comments.

================
Comment at: test/CodeGen/X86/prefer-avx256-mask-shuffle.ll:210
 ; AVX256VLBW-NEXT:    kmovd %eax, %k1
-; AVX256VLBW-NEXT:    vpshufb {{.*#+}} ymm0 {%k1} = ymm1[u,u,6,u,u,u,u,u,u,u,u,u,u,5,u,u,19,22,u,28,19,23,23,16,19,22,17,29,19,u,23,16]
+; AVX256VLBW-NEXT:    vmovdqu8 %ymm2, %ymm0 {%k1}
 ; AVX256VLBW-NEXT:    vpmovb2m %ymm0, %k0
----------------
craig.topper wrote:
> craig.topper wrote:
> > RKSimon wrote:
> > > This is the only notable regression - any idea why it breaks so badly?
> > It looks like we go through lowerVectorShuffleAsLanePermuteAndBlend which makes the unary shuffle non-unary. Then we go through lowerVectorShuffleByMerging128BitLanes which creates a repeated mask. But we still weren't able to handle this repeated mask cleanly so we end up shuffling and blending.
> Why can't shuffle combining merge the two vblendds with the vpermq to create two new vpermqs? Is it because the vpermq is used twice or the fact that vblendd is v8i32 and vpermq is v4i64? Or something else?
Target shuffle combining mainly combines to single ops only, there might be scope to improve this if lower1BitVectorShuffle avoids mask vector extension to 512-bit when prefer-256 is enabled.

Repository:
  rL LLVM

https://reviews.llvm.org/D41794