[llvm] [AMDGPU][True16][CodeGen] true16 codegen pattern for fma (PR #122950)

Wed Jan 22 09:15:16 PST 2025

================
@@ -107,11 +108,18 @@ define half @v_fma_f16(half %x, half %y, half %z) {
 ; GFX10-NEXT:    v_fma_f16 v0, v0, v1, v2
 ; GFX10-NEXT:    s_setpc_b64 s[30:31]
 ;
-; GFX11-LABEL: v_fma_f16:
-; GFX11:       ; %bb.0:
-; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT:    v_fma_f16 v0, v0, v1, v2
-; GFX11-NEXT:    s_setpc_b64 s[30:31]
+; GFX11-TRUE16-LABEL: v_fma_f16:
+; GFX11-TRUE16:       ; %bb.0:
+; GFX11-TRUE16-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-TRUE16-NEXT:    v_fmac_f16_e32 v2.l, v0.l, v1.l
----------------
Sisyph wrote:

This looks like it should be optimized in the True16 case. I notice we have not optimized it downstream either. If it is not easy to fix, I'd be ok landing this if the optimization was tracked for a later fix.

https://github.com/llvm/llvm-project/pull/122950