[llvm] [AMDGPU][CodeGenPrepare] Narrow 64 bit math to 32 bit if profitable (PR #130577)

via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 12 09:58:16 PDT 2025


================
@@ -65,8 +65,12 @@ define <4 x i16> @and_mulhuw_v4i16(<4 x i64> %a, <4 x i64> %b) {
 ;
 ; AVX512-LABEL: and_mulhuw_v4i16:
 ; AVX512:       # %bb.0:
-; AVX512-NEXT:    vpmulhuw %ymm1, %ymm0, %ymm0
-; AVX512-NEXT:    vpmovqw %zmm0, %xmm0
+; AVX512-NEXT:    # kill: def $ymm1 killed $ymm1 def $zmm1
+; AVX512-NEXT:    # kill: def $ymm0 killed $ymm0 def $zmm0
+; AVX512-NEXT:    vpmovqd %zmm0, %ymm0
+; AVX512-NEXT:    vpmovqd %zmm1, %ymm1
+; AVX512-NEXT:    vpmulhuw %xmm1, %xmm0, %xmm0
+; AVX512-NEXT:    vpshufb {{.*#+}} xmm0 = xmm0[0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
----------------
Shoreshen wrote:

Hi @nikic , I looked a little bit more, it seems like x86 backend will optimize mul according to the following instructions.

For example:
```
define <8 x i64> @zext_mulhuw_v8i16_lshr_i64(<8 x i16> %a, <8 x i16> %b) #0 {
  %a1 = zext <8 x i16> %a to <8 x i64>
  %b1 = zext <8 x i16> %b to <8 x i64>
  %c = mul <8 x i64> %a1, %b1
  %d = lshr <8 x i64> %c, splat (i64 16)
  ret <8 x i64> %d
}
```
will select the following instruction:
```
Function Live Ins: $xmm0 in %0, $xmm1 in %1

bb.0 (%ir-block.0):
  liveins: $xmm0, $xmm1
  %1:vr128 = COPY $xmm1
  %0:vr128 = COPY $xmm0
  %2:vr128 = VPMULHUWrr %0:vr128, %1:vr128
  %3:vr512 = VPMOVZXWQZrr killed %2:vr128
  $zmm0 = COPY %3:vr512
  RET 0, $zmm0
```

It seems like the backend is trying to optimize vector mult result according to the following instructions by combining different kind of mul and mov instructions.

Should I suppress this optimization for X64 backand, mul and vector??

https://github.com/llvm/llvm-project/pull/130577


More information about the llvm-commits mailing list