[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)

Tue May 13 02:20:03 PDT 2025

vg0204 wrote:

> You have a post-processing of the function in the SDWA pass that doesn't really have anything to do with SDWA or the rest of the pass. The problem you are solving appears to be avoiding the interference of v_pack_b32_f16 instructions, which only apply in a narrow range of cases.

The point of doing it at this point is the readily available utilities needed such as SDWACandidateCheck , SDWAConvertOfMI &  legalizeScalarOperands, agreeing to the point that this patch doesn't have  anything to do with SDWA peephole or the rest of the pass.

https://github.com/llvm/llvm-project/pull/137137