[llvm-branch-commits] [llvm] AMDGPU: Codegen for v_dual_dot2acc_f32_f16/bf16 from VOP3 (PR #179226)
Mirko BrkuĊĦanin via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Thu Mar 5 08:17:21 PST 2026
================
@@ -297,7 +316,10 @@ define float @v_fdot2_f32_bf16_inline_literal_c(<2 x bfloat> %a, <2 x bfloat> %b
;
; GFX11PLUS-LABEL: v_fdot2_f32_bf16_inline_literal_c:
; GFX11PLUS: ; %bb.0:
-; GFX11PLUS: v_dot2_f32_bf16 v0, v0, v1, 2.0
+; GFX11PLUS: s_mov_b32 s0, 2.0
+; GFX11PLUS: v_mov_b32_e32 v2, s0
+; GFX11PLUS: v_dot2_f32_bf16 v2, v0, v1, v2
+; GFX11PLUS: v_mov_b32_e32 v0, v2
----------------
mbrkusanin wrote:
I guess this is a tradeoff between keeping `v_dot2_f32_bf16` in a form so that it can be a VOPD candidate vs. having optimal register allocation and being able to inline constants.
Extra `v_mov`(s) can potentially be eliminated in tests with more instructions, but `s_mov` from constant not being folded will stay. Maybe we could transform `VOP2 pseudo` to `VOP3` in SIFoldOperands if there is a constant that can be folded. In that case we are definitely eliminating one instruction vs maybe eliminating one by creating v_dual later.
https://github.com/llvm/llvm-project/pull/179226
More information about the llvm-branch-commits
mailing list