[llvm] [AMDGPU] fix SIPeepholeSDWA optimization for fp16 (PR #109395)

Fri Nov 29 02:41:12 PST 2024

PankajDwivedi-25 wrote:

There is a similar issue in int8 where some instruction are getting translated into SDWA.

**Without SDWA:**
v_cmp_eq_u32_e32 vcc, 1, v17
v_bfe_i32 v12, v15, 0, 8
v_cmp_ge_i32_e64 s[8:9], v16, v12
v_cmp_lt_i32_e64 s[4:5], v16, v12

**With SDWA:**
v_cmp_ge_i32_sdwa s[8:9], v16, sext(v15) src0_sel:DWORD src1_sel:BYTE_0
v_cmp_lt_i32_sdwa s[4:5], v16, sext(v15) src0_sel:DWORD src1_sel:BYTE_0

There are multiple similar instances in the final asm, are you suspecting similar issue with int8 as well?

https://github.com/llvm/llvm-project/pull/109395