[llvm] [AMDGPU][True16][CodeGen] update isel pattern with vgpr16 for 16 bit types (PR #154875)

Mon Sep 8 13:50:13 PDT 2025

================
@@ -43,13 +43,22 @@ define amdgpu_ps i16 @s_copysign_f16(half inreg %mag, half inreg %sign) {
 ; GFX9-NEXT:    v_readfirstlane_b32 s0, v0
 ; GFX9-NEXT:    ; return to shader part epilog
 ;
-; GFX11-LABEL: s_copysign_f16:
-; GFX11:       ; %bb.0:
-; GFX11-NEXT:    v_mov_b32_e32 v0, s1
-; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
-; GFX11-NEXT:    v_bfi_b32 v0, 0x7fff, s0, v0
-; GFX11-NEXT:    v_readfirstlane_b32 s0, v0
-; GFX11-NEXT:    ; return to shader part epilog
+; GFX11-TRUE16-LABEL: s_copysign_f16:
+; GFX11-TRUE16:       ; %bb.0:
+; GFX11-TRUE16-NEXT:    v_mov_b16_e32 v0.l, s1
+; GFX11-TRUE16-NEXT:    v_mov_b16_e32 v1.l, s0
+; GFX11-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
+; GFX11-TRUE16-NEXT:    v_bfi_b32 v0, 0x7fff, v1, v0
----------------
broxigarchen wrote:

nvm. I think we don't do this folding (we know that s0 and v1 is the same i16 in ISel, but in folding pass this seems require some additional context analysis).

I don't think we can do much in ISel. In ISel we don't know if the 16bit operand is a vpgr or sgpr, and we force to use vgpr16 which leads to this additional copy. We might be able to add this folding in folding pass, but not sure if it worth

https://github.com/llvm/llvm-project/pull/154875