[llvm] [X86] `combinePMULH` - combine `mulhu` + `srl` (PR #132548)

Mon Jul 14 01:31:15 PDT 2025

================
@@ -2166,3 +2166,260 @@ define <8 x i16> @sse2_pmulhu_w_const(<8 x i16> %a0, <8 x i16> %a1) {
 }
 declare <8 x i16> @llvm.x86.sse2.pmulhu.w(<8 x i16>, <8 x i16>)
 
+define <8 x i16> @zext_mul_and_shift17(<8 x i16> %a, <8 x i16> %b) {
+; SSE-LABEL: zext_mul_and_shift17:
+; SSE:       # %bb.0:
+; SSE-NEXT:    pmulhuw %xmm1, %xmm0
+; SSE-NEXT:    psrlw $1, %xmm0
+; SSE-NEXT:    retq
+;
+; AVX-LABEL: zext_mul_and_shift17:
+; AVX:       # %bb.0:
+; AVX-NEXT:    vpmulhuw %xmm1, %xmm0, %xmm0
+; AVX-NEXT:    vpsrlw $1, %xmm0, %xmm0
+; AVX-NEXT:    retq
+  %a.ext = zext <8 x i16> %a to <8 x i32>
+  %b.ext = zext <8 x i16> %b to <8 x i32>
+  %mul = mul <8 x i32> %a.ext, %b.ext
+  %shift = lshr <8 x i32> %mul, splat(i32 17)
+  %trunc = trunc <8 x i32> %shift to <8 x i16>
+  ret <8 x i16> %trunc
+}
----------------
RKSimon wrote:

It'd be better if we tested a range of shift values as well as the shift by 17 from the original bug report - shifts by 24 and 31 will be interesting for instance.

https://github.com/llvm/llvm-project/pull/132548