[llvm] [AMDGPU] Form V_MAD_U64_U32 from mul24 (PR #72393)

Fri Dec 8 03:55:29 PST 2023

================
@@ -676,6 +676,16 @@ multiclass IMAD32_Pats <VOP3_Pseudo inst> {
         (ThreeOpFragSDAG<mul, add> i32:$src0, i32:$src1, (i32 imm:$src2)),
         (EXTRACT_SUBREG (inst $src0, $src1, (i64 (as_i64imm $src2)), 0 /* clamp */), sub0)
         >;
+
+  // Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a normal mul.
+  def : GCNPat <
+      (i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
+      (inst $src0, $src1, $src2, 0 /* clamp */)
+      >;
+  def : GCNPat <
+      (i64 (add (i64 (zext (i32 (AMDGPUmul_u24 i32:$src0, i32:$src1)))), i64:$src2)),
----------------
jayfoad wrote:

This doesn't look right. The source pattern here does a 24 * 24 bit multiply, takes the low 32 bits of the 48 bit result, and then zero extends to 64 bits. That is different from what the mad instruction does, which does not do the truncation to 32 bits.

https://github.com/llvm/llvm-project/pull/72393