[PATCH] D41794: [X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 31 11:49:15 PDT 2018
craig.topper added inline comments.
================
Comment at: test/CodeGen/X86/prefer-avx256-mask-shuffle.ll:210
; AVX256VLBW-NEXT: kmovd %eax, %k1
-; AVX256VLBW-NEXT: vpshufb {{.*#+}} ymm0 {%k1} = ymm1[u,u,6,u,u,u,u,u,u,u,u,u,u,5,u,u,19,22,u,28,19,23,23,16,19,22,17,29,19,u,23,16]
+; AVX256VLBW-NEXT: vmovdqu8 %ymm2, %ymm0 {%k1}
; AVX256VLBW-NEXT: vpmovb2m %ymm0, %k0
----------------
RKSimon wrote:
> This is the only notable regression - any idea why it breaks so badly?
It looks like we go through lowerVectorShuffleAsLanePermuteAndBlend which makes the unary shuffle non-unary. Then we go through lowerVectorShuffleByMerging128BitLanes which creates a repeated mask. But we still weren't able to handle this repeated mask cleanly so we end up shuffling and blending.
Repository:
rL LLVM
https://reviews.llvm.org/D41794
More information about the llvm-commits
mailing list