[llvm] [X86] Fold BLEND(PERMUTE(X), PERMUTE(Y)) -> PERMUTE(BLEND(X, Y)) (PR #90219)
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 29 05:56:09 PDT 2024
================
@@ -679,9 +679,8 @@ define <4 x i32> @sequential_sum_v4i32_v4i32(<4 x i32> %0, <4 x i32> %1, <4 x i3
; AVX1-SLOW-NEXT: vphaddd %xmm1, %xmm0, %xmm4
; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm4 = xmm4[0,2,2,3]
; AVX1-SLOW-NEXT: vpunpckhdq {{.*#+}} xmm5 = xmm0[2],xmm1[2],xmm0[3],xmm1[3]
-; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
-; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[3,3,3,3]
-; AVX1-SLOW-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3],xmm0[4,5,6,7]
+; AVX1-SLOW-NEXT: vpunpckhqdq {{.*#+}} xmm0 = xmm0[1],xmm1[1]
+; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]
----------------
RKSimon wrote:
The demanded elts for the original BLENDI nodes was the only the lower 2 elements - so the new permute mask ends up as <1,3,u,u> which getV4X86ShuffleImm will set the mask to use the original 2,3 elements.
I have wondered if getV4X86ShuffleImm should fold PSHUFD <X,Y,u,u> masks to <X,Y,X,Y> instead of <X,Y,2,3> but in general this doesn't help us :(
https://github.com/llvm/llvm-project/pull/90219
More information about the llvm-commits
mailing list