[llvm] [AMDGPU][True16][CodeGen] sext i16 inreg in true16 mode (PR #144024)

Tue Jun 17 15:07:07 PDT 2025

broxigarchen wrote:

> Can you please add a reduced test case for this selection scenario? The affected tests are not very specific. Since we need a patch to fix the downstream regression, I am inclined to approve approach after that, suggest you continue working on a legalizer based fix after that.
> 
> It seems to have mixed results on isa quality.

Just added a test in the end of sext-in-reg.ll. It's not quit straightforward read the MIR. I post the previous bad version of MIR here so that it's easier to see the error
```
; GFX11-TRUE16-LABEL: v_sext_in_reg_i8_i16_shuffer_vector:
; GFX11-TRUE16:       ; %bb.0:
; GFX11-TRUE16-NEXT:    s_load_b128 s[0:3], s[4:5], 0x34
; GFX11-TRUE16-NEXT:    v_and_b32_e32 v0, 0x3ff, v0
; GFX11-TRUE16-NEXT:    s_waitcnt lgkmcnt(0)
; GFX11-TRUE16-NEXT:    global_load_b32 v1, v0, s[2:3]
; GFX11-TRUE16-NEXT:    s_waitcnt vmcnt(0)
; GFX11-TRUE16-NEXT:    v_mov_b16_e32 v0.l, v1.l
; GFX11-TRUE16-NEXT:    v_mov_b16_e32 v2.l, v1.h
; GFX11-TRUE16-NEXT:    v_ashrrev_i32_e32 v4, 24, v1
; GFX11-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_3) | instskip(NEXT) | instid1(VALU_DEP_3)
; GFX11-TRUE16-NEXT:    v_bfe_i32 v3, v0, 0, 8
; GFX11-TRUE16-NEXT:    v_bfe_i32 v2, v2, 0, 8
; GFX11-TRUE16-NEXT:    v_ashrrev_i16 v0.l, 8, v1.l
; GFX11-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
; GFX11-TRUE16-NEXT:    v_cvt_f16_i16_e32 v0.h, v4.l
; GFX11-TRUE16-NEXT:    v_mov_b16_e32 v1.l, v3.l
; GFX11-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_4)
; GFX11-TRUE16-NEXT:    v_cvt_f16_i16_e32 v1.h, v2.l
; GFX11-TRUE16-NEXT:    v_cvt_f16_i16_e32 v0.l, v0.l
; GFX11-TRUE16-NEXT:    v_mov_b32_e32 v3, 0
; GFX11-TRUE16-NEXT:    s_delay_alu instid0(VALU_DEP_4) | instskip(NEXT) | instid1(VALU_DEP_1)
; GFX11-TRUE16-NEXT:    v_cvt_f16_i16_e32 v1.l, v1.l
; GFX11-TRUE16-NEXT:    v_pack_b32_f16 v2, v0.l, v1.l
; GFX11-TRUE16-NEXT:    v_pack_b32_f16 v1, v0.h, v1.h
; GFX11-TRUE16-NEXT:    global_store_b64 v3, v[1:2], s[0:1]
; GFX11-TRUE16-NEXT:    s_endpgm
;
```
The sequence of v1.h is
```
v_mov_b16_e32 v2.l, v1.h
v_cvt_f16_i16_e32 v1.h, v2.l
v_pack_b32_f16 v1, v0.h, v1.h
```
The top byte of a i8 is not zero out before f16 conversion.

It's expected to see isa quality drop since this is fixing a correctness issue. We are reserving the top 16bits before we do v_bfe_i32, and thus the codegen should do additional copy to move the .h to another reg

https://github.com/llvm/llvm-project/pull/144024