[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)

Mon May 26 05:39:17 PDT 2025

https://github.com/frederik-h commented:

I would have expected that this can be handled by creating a `SDWADstPreserveOperand ` for the `V_PACK_B32_F16_e64` instruction in `SIPeepholeSDWA::matchSDWAOperand` and adding special handling for it in `SDWADstPreserveOperand::convertToSDWA.` See the handling for `V_OR_B32_e64`. Unfortunately, there are no explanations in the source code, but the original patch review for this feature is very informative: https://reviews.llvm.org/D37817.

Have you looked into this approach?

https://github.com/llvm/llvm-project/pull/137137