[llvm] [AMDGPU] Form V_MAD_U64_U32 from mul24 (PR #72393)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 30 21:32:16 PST 2023
================
@@ -6928,7 +6962,119 @@ entry:
ret <2 x i16> %add0
}
+define i64 @mul_u24_add64(i32 %x, i32 %y, i64 %z) {
+; GFX67-LABEL: mul_u24_add64:
+; GFX67: ; %bb.0:
+; GFX67-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX67-NEXT: v_mul_hi_u32_u24_e32 v4, v0, v1
+; GFX67-NEXT: v_mul_u32_u24_e32 v0, v0, v1
+; GFX67-NEXT: v_add_i32_e32 v0, vcc, v0, v2
+; GFX67-NEXT: v_addc_u32_e32 v1, vcc, v4, v3, vcc
+; GFX67-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX8-LABEL: mul_u24_add64:
+; GFX8: ; %bb.0:
+; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: v_mul_hi_u32_u24_e32 v4, v0, v1
+; GFX8-NEXT: v_mul_u32_u24_e32 v0, v0, v1
+; GFX8-NEXT: v_add_u32_e32 v0, vcc, v0, v2
+; GFX8-NEXT: v_addc_u32_e32 v1, vcc, v4, v3, vcc
+; GFX8-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX900-SDAG-LABEL: mul_u24_add64:
+; GFX900-SDAG: ; %bb.0:
+; GFX900-SDAG-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-SDAG-NEXT: v_mul_hi_u32_u24_e32 v4, v0, v1
+; GFX900-SDAG-NEXT: v_mul_u32_u24_e32 v0, v0, v1
+; GFX900-SDAG-NEXT: v_add_co_u32_e32 v0, vcc, v0, v2
+; GFX900-SDAG-NEXT: v_addc_co_u32_e32 v1, vcc, v4, v3, vcc
+; GFX900-SDAG-NEXT: s_setpc_b64 s[30:31]
+;
+; GFX9-GISEL-LABEL: mul_u24_add64:
+; GFX9-GISEL: ; %bb.0:
+; GFX9-GISEL-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX9-GISEL-NEXT: v_mul_hi_u32_u24_e32 v4, v0, v1
+; GFX9-GISEL-NEXT: v_mul_u32_u24_e32 v0, v0, v1
+; GFX9-GISEL-NEXT: v_add_co_u32_e32 v0, vcc, v0, v2
+; GFX9-GISEL-NEXT: v_addc_co_u32_e32 v1, vcc, v4, v3, vcc
----------------
arsenm wrote:
I don't follow how this is better? This is 16 cycles? vs. 16 cycles for the v_mad_u64_u32? We could also consider optsize
https://github.com/llvm/llvm-project/pull/72393
More information about the llvm-commits
mailing list