[llvm] [AMDGPU][True16][CodeGen] true16 codegen pattern for fma (PR #122950)
Joe Nash via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 22 09:15:16 PST 2025
================
@@ -107,11 +108,18 @@ define half @v_fma_f16(half %x, half %y, half %z) {
; GFX10-NEXT: v_fma_f16 v0, v0, v1, v2
; GFX10-NEXT: s_setpc_b64 s[30:31]
;
-; GFX11-LABEL: v_fma_f16:
-; GFX11: ; %bb.0:
-; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT: v_fma_f16 v0, v0, v1, v2
-; GFX11-NEXT: s_setpc_b64 s[30:31]
+; GFX11-TRUE16-LABEL: v_fma_f16:
+; GFX11-TRUE16: ; %bb.0:
+; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-TRUE16-NEXT: v_fmac_f16_e32 v2.l, v0.l, v1.l
----------------
Sisyph wrote:
This looks like it should be optimized in the True16 case. I notice we have not optimized it downstream either. If it is not easy to fix, I'd be ok landing this if the optimization was tracked for a later fix.
https://github.com/llvm/llvm-project/pull/122950
More information about the llvm-commits
mailing list