[llvm] [AMDGPU] Form V_MAD_U64_U32 from mul24 (PR #72393)

Thu Dec 7 01:28:56 PST 2023

================
@@ -678,9 +678,26 @@ multiclass IMAD32_Pats <VOP3_Pseudo inst> {
         >;
 }
 
+// Handle cases where amdgpu-codegenprepare-mul24 made a mul24 instead of a normal mul.
+// We need to separate this because otherwise OtherPredicates would be overriden.
+multiclass IMAD32_Mul24_Pats <VOP3_Pseudo inst> {
+  def : GCNPat <
+      (i64 (add (i64 (AMDGPUmul_u24 i32:$src0, i32:$src1)), i64:$src2)),
+      (inst $src0, $src1, $src2, 0 /* clamp */)
+      >;
+  def : GCNPat <
+      (i64 (add (i64 (zext (i32 (AMDGPUmul_u24 i32:$src0, i32:$src1)))), i64:$src2)),
+      (inst $src0, $src1, $src2, 0 /* clamp */)
+      >;
----------------
arsenm wrote:

If this is just a mul_lo24 I can see the speed model mattering. It's 12 cycles + free next vs. 16 

https://github.com/llvm/llvm-project/pull/72393