[llvm] [AMDGPU] Form V_MAD_U64_U32 from mul24 (PR #72393)

Fri Dec 8 04:17:56 PST 2023

================
@@ -676,6 +676,16 @@ multiclass IMAD32_Pats <VOP3_Pseudo inst> {
         (ThreeOpFragSDAG<mul, add> i32:$src0, i32:$src1, (i32 imm:$src2)),
         (EXTRACT_SUBREG (inst $src0, $src1, (i64 (as_i64imm $src2)), 0 /* clamp */), sub0)
         >;
+
+  // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a normal mul.
+  def : GCNPat <
+      (i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
+      (inst $src0, $src1, $src2, 0 /* clamp */)
+      >;
+  def : GCNPat <
+      (i64 (add (i64 (zext (i32 (AMDGPUmul_u24 i32:$src0, i32:$src1)))), i64:$src2)),
----------------
Pierre-vh wrote:

Initially, I was trying to get V_MAD out of:
```
          t328: i32 = MUL_U24 # D:1 t430, Constant:i32<22853>
          t103: i32 = llvm.amdgcn.mulhi.u24 # D:1 TargetConstant:i64<2489>, t430, Constant:i32<22853>
        t419: v2i32 = BUILD_VECTOR # D:1 t328, t103
      t420: i64 = bitcast # D:1 t419
          t428: i32 = truncate # D:1 t255
        t426: v2i32 = BUILD_VECTOR # D:1 t428, Constant:i32<0>
      t427: i64 = bitcast # D:1 t426
    t108: i64 = add # D:1 t420, t427
```

Now we no longer have the `BUILD_VECTOR` due to the CGP changes.
I think the i32 mul24 pattern you're asking about isn't needed - I just thought I'd also make it work for i32. I'll retest without it to see if it matters at all.

https://github.com/llvm/llvm-project/pull/72393