[llvm] [AMDGPU] fix SIPeepholeSDWA optimization for fp16 (PR #109395)
Pankaj Dwivedi via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 11 03:15:12 PST 2024
PankajDwivedi-25 wrote:
I suspect that` v_bfe_i32 v12, v15, 0, 8` is not equivalent to `sext(v15) src0_sel:DWORD src1_sel:BYTE_0` ?
which is the optimization for folding `v_bfe_i32` in SDWA optimization.
If they are not equivalent, then signed byte extract with `V_BFE_I32_e64` should be excluded from the SDWA peephole optimization? for above pytorch sorting test failure.
@yxsamliu @arsenm @bfavela
https://github.com/llvm/llvm-project/pull/109395
More information about the llvm-commits
mailing list