[llvm] [X86] LowerShiftByScalarImmediate - vXi8 shl(X,2) - prefer PADDB+PADDB pair over PSLLW+PAND (PR #186095)
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 2 01:54:54 PDT 2026
================
@@ -1225,25 +1225,32 @@ define <32 x i8> @splatconstant_fshr_v32i8(<32 x i8> %a, <32 x i8> %b) nounwind
; GFNISSE: # %bb.0:
; GFNISSE-NEXT: movdqa {{.*#+}} xmm4 = [0,0,0,0,0,0,128,64,0,0,0,0,0,0,128,64]
; GFNISSE-NEXT: gf2p8affineqb $0, %xmm4, %xmm2
-; GFNISSE-NEXT: movdqa {{.*#+}} xmm5 = [32,16,8,4,2,1,0,0,32,16,8,4,2,1,0,0]
-; GFNISSE-NEXT: gf2p8affineqb $0, %xmm5, %xmm0
+; GFNISSE-NEXT: paddb %xmm0, %xmm0
+; GFNISSE-NEXT: paddb %xmm0, %xmm0
; GFNISSE-NEXT: por %xmm2, %xmm0
; GFNISSE-NEXT: gf2p8affineqb $0, %xmm4, %xmm3
-; GFNISSE-NEXT: gf2p8affineqb $0, %xmm5, %xmm1
+; GFNISSE-NEXT: paddb %xmm1, %xmm1
+; GFNISSE-NEXT: paddb %xmm1, %xmm1
; GFNISSE-NEXT: por %xmm3, %xmm1
; GFNISSE-NEXT: retq
;
; GFNIAVX1-LABEL: splatconstant_fshr_v32i8:
; GFNIAVX1: # %bb.0:
+; GFNIAVX1-NEXT: vpaddb %xmm0, %xmm0, %xmm2
+; GFNIAVX1-NEXT: vpaddb %xmm2, %xmm2, %xmm2
+; GFNIAVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
+; GFNIAVX1-NEXT: vpaddb %xmm0, %xmm0, %xmm0
+; GFNIAVX1-NEXT: vpaddb %xmm0, %xmm0, %xmm0
+; GFNIAVX1-NEXT: vinsertf128 $1, %xmm0, %ymm2, %ymm0
----------------
RKSimon wrote:
This demonstrates a limit on our GFNI expansion of v32i8 shifts - AVX1 targets will already have split into 2 x v16xi8 and we rely on concat folds to recreate the ymm GFNI op - I can provide a workaround if you think it useful, but IIRC there isn't a AVX1+GFNI target.
https://github.com/llvm/llvm-project/pull/186095
More information about the llvm-commits
mailing list