[llvm] [AMDGPU] fix SIPeepholeSDWA optimization for fp16 (PR #109395)

Pankaj Dwivedi via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 11 03:15:12 PST 2024


PankajDwivedi-25 wrote:

I suspect that` v_bfe_i32 v12, v15, 0, 8` is not equivalent to `sext(v15) src0_sel:DWORD src1_sel:BYTE_0` ?
which is the optimization for folding `v_bfe_i32` in SDWA optimization.
If they are not equivalent, then signed byte extract with `V_BFE_I32_e64` should be excluded from the SDWA peephole optimization? for above pytorch sorting test failure.

@yxsamliu @arsenm @bfavela 

https://github.com/llvm/llvm-project/pull/109395


More information about the llvm-commits mailing list