[llvm] [AMDGPU] Fix vector legalization for bf16 valu ops (PR #158439)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 24 02:35:06 PDT 2025
================
@@ -46780,6 +46925,13 @@ define <4 x bfloat> @v_fma_v4bf16(<4 x bfloat> %a, <4 x bfloat> %b, <4 x bfloat>
; GFX11FAKE16-NEXT: s_delay_alu instid0(VALU_DEP_2)
; GFX11FAKE16-NEXT: v_perm_b32 v1, v4, v1, 0x7060302
; GFX11FAKE16-NEXT: s_setpc_b64 s[30:31]
+; GFX1250-LABEL: v_fma_v4bf16:
+; GFX1250: ; %bb.0:
+; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0
+; GFX1250-NEXT: s_wait_kmcnt 0x0
+; GFX1250-NEXT: v_pk_fma_bf16 v0, v0, v2, v4
+; GFX1250-NEXT: v_pk_fma_bf16 v1, v1, v3, v5
+; GFX1250-NEXT: s_set_pc_i64 s[30:31]
%op = call <4 x bfloat> @llvm.fma.v4bf16(<4 x bfloat> %a, <4 x bfloat> %b, <4 x bfloat> %c)
ret <4 x bfloat> %op
}
----------------
arsenm wrote:
This test doesn't seem to cover the FMA sizes > 4
https://github.com/llvm/llvm-project/pull/158439
More information about the llvm-commits
mailing list