[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)
Vikash Gupta via llvm-commits
llvm-commits at lists.llvm.org
Tue May 13 02:20:03 PDT 2025
vg0204 wrote:
> You have a post-processing of the function in the SDWA pass that doesn't really have anything to do with SDWA or the rest of the pass. The problem you are solving appears to be avoiding the interference of v_pack_b32_f16 instructions, which only apply in a narrow range of cases.
The point of doing it at this point is the readily available utilities needed such as SDWACandidateCheck , SDWAConvertOfMI & legalizeScalarOperands, agreeing to the point that this patch doesn't have anything to do with SDWA peephole or the rest of the pass.
https://github.com/llvm/llvm-project/pull/137137
More information about the llvm-commits
mailing list