[llvm] [AMDGPU][True16][CodeGen] true16 codegen pattern for f16 canonicalize (PR #122000)
Brox Chen via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 22 08:31:23 PST 2025
================
@@ -2358,13 +2446,21 @@ define <2 x half> @v_test_canonicalize_undef_reg_v2f16(half %val) #1 {
; CI-NEXT: v_mov_b32_e32 v0, 0x7fc00000
; CI-NEXT: s_setpc_b64 s[30:31]
;
-; GFX11-LABEL: v_test_canonicalize_undef_reg_v2f16:
-; GFX11: ; %bb.0:
-; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX11-NEXT: v_max_f16_e32 v0, v0, v0
-; GFX11-NEXT: s_delay_alu instid0(VALU_DEP_1)
-; GFX11-NEXT: v_lshlrev_b32_e32 v0, 16, v0
-; GFX11-NEXT: s_setpc_b64 s[30:31]
+; GFX11-TRUE16-LABEL: v_test_canonicalize_undef_reg_v2f16:
+; GFX11-TRUE16: ; %bb.0:
+; GFX11-TRUE16-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-TRUE16-NEXT: v_max_f16_e32 v0.l, v0.l, v0.l
+; GFX11-TRUE16-NEXT: s_delay_alu instid0(VALU_DEP_1)
+; GFX11-TRUE16-NEXT: v_lshlrev_b32_e32 v0, 16, v0
----------------
broxigarchen wrote:
I think the left shift here also zero out the lower 16bits. If we store directly to v0.h then we need to run an additional `and 0xffff`
https://github.com/llvm/llvm-project/pull/122000
More information about the llvm-commits
mailing list