[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)

Tue May 13 01:15:37 PDT 2025

https://github.com/arsenm requested changes to this pull request.

I think you need to step back and wholly reevaluate what you are trying to do here. You have a post-processing of the function in the SDWA pass that doesn't really have anything to do with SDWA or the rest of the pass. The problem you are solving appears to be avoiding the interference of v_pack_b32_f16 instructions, which only apply in a narrow range of cases.

It would be easier to solve this by avoiding introducing that problem in the first place. Whether that's by checking if the uses are SDWA candidates, or just not using v_pack_b32_f16 in the first place. It would be simpler to form v_pack_b32_f16 later than trying to look through while while also mixing in conversion to SDWA 

https://github.com/llvm/llvm-project/pull/137137