[llvm] [X86] LowerShiftByScalarImmediate - vXi8 shl(X,2) - prefer PADDB+PADDB pair over PSLLW+PAND (PR #186095)
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 2 01:57:17 PDT 2026
================
@@ -447,11 +447,13 @@ define <32 x i8> @var_funnnel_v32i8(<32 x i8> %x, <32 x i8> %amt) nounwind {
; AVX512F-NEXT: vpternlogd {{.*#+}} zmm3 = zmm3 ^ (m32bcst & (zmm3 ^ zmm2))
; AVX512F-NEXT: vpsllw $5, %ymm1, %ymm1
; AVX512F-NEXT: vpblendvb %ymm1, %ymm3, %ymm0, %ymm0
-; AVX512F-NEXT: vpsllw $2, %ymm0, %ymm2
-; AVX512F-NEXT: vpsrlw $6, %ymm0, %ymm3
-; AVX512F-NEXT: vpternlogd {{.*#+}} zmm3 = zmm3 ^ (m32bcst & (zmm3 ^ zmm2))
+; AVX512F-NEXT: vpsrlw $6, %ymm0, %ymm2
+; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm2, %ymm2
+; AVX512F-NEXT: vpaddb %ymm0, %ymm0, %ymm3
+; AVX512F-NEXT: vpaddb %ymm3, %ymm3, %ymm3
+; AVX512F-NEXT: vpor %ymm2, %ymm3, %ymm2
----------------
RKSimon wrote:
This is the repeated mask issue I mentioned on #189986 - I can add custom rotate folding for non-AVX512VL targets to get the VPTERNLOG back if you think it necessary, but its pretty niche.
https://github.com/llvm/llvm-project/pull/186095
More information about the llvm-commits
mailing list