[llvm] [AMDGPU] fix SIPeepholeSDWA optimization for fp16 (PR #109395)
Pankaj Dwivedi via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 29 02:41:12 PST 2024
PankajDwivedi-25 wrote:
There is a similar issue in int8 where some instruction are getting translated into SDWA.
**Without SDWA:**
v_cmp_eq_u32_e32 vcc, 1, v17
v_bfe_i32 v12, v15, 0, 8
v_cmp_ge_i32_e64 s[8:9], v16, v12
v_cmp_lt_i32_e64 s[4:5], v16, v12
**With SDWA:**
v_cmp_ge_i32_sdwa s[8:9], v16, sext(v15) src0_sel:DWORD src1_sel:BYTE_0
v_cmp_lt_i32_sdwa s[4:5], v16, sext(v15) src0_sel:DWORD src1_sel:BYTE_0
There are multiple similar instances in the final asm, are you suspecting similar issue with int8 as well?
https://github.com/llvm/llvm-project/pull/109395
More information about the llvm-commits
mailing list