[llvm] AMDGPU: Replace copy-to-mov-imm folding logic with class compat checks (PR #154501)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 26 04:05:22 PDT 2025
================
@@ -383,14 +384,15 @@ define amdgpu_ps half @v_interp_f16_imm_params(float inreg %i, float inreg %j) #
;
; GFX12-TRUE16-LABEL: v_interp_f16_imm_params:
; GFX12-TRUE16: ; %bb.0: ; %main_body
-; GFX12-TRUE16-NEXT: v_mov_b16_e32 v0.l, 0
-; GFX12-TRUE16-NEXT: v_dual_mov_b32 v1, s0 :: v_dual_mov_b32 v2, 0
+; GFX12-TRUE16-NEXT: v_dual_mov_b32 v1, 0 :: v_dual_mov_b32 v2, s0
; GFX12-TRUE16-NEXT: v_mov_b32_e32 v3, s1
-; GFX12-TRUE16-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_2)
-; GFX12-TRUE16-NEXT: v_interp_p10_f16_f32 v1, v0.l, v1, v0.l wait_exp:7
-; GFX12-TRUE16-NEXT: v_interp_p2_f16_f32 v0.l, v0.l, v3, v2 wait_exp:7
; GFX12-TRUE16-NEXT: s_delay_alu instid0(VALU_DEP_2) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX12-TRUE16-NEXT: v_cvt_f16_f32_e32 v0.h, v1
+; GFX12-TRUE16-NEXT: v_mov_b16_e32 v0.l, v1.l
+; GFX12-TRUE16-NEXT: v_interp_p10_f16_f32 v2, v0.l, v2, v0.l wait_exp:7
+; GFX12-TRUE16-NEXT: s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_2)
+; GFX12-TRUE16-NEXT: v_interp_p2_f16_f32 v0.l, v0.l, v3, v1 wait_exp:7
+; GFX12-TRUE16-NEXT: v_cvt_f16_f32_e32 v0.h, v2
+; GFX12-TRUE16-NEXT: s_delay_alu instid0(VALU_DEP_1)
----------------
arsenm wrote:
Not the concern of immediate folding
https://github.com/llvm/llvm-project/pull/154501
More information about the llvm-commits
mailing list