[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)
Frederik Harwath via llvm-commits
llvm-commits at lists.llvm.org
Mon May 26 05:39:17 PDT 2025
https://github.com/frederik-h commented:
I would have expected that this can be handled by creating a `SDWADstPreserveOperand ` for the `V_PACK_B32_F16_e64` instruction in `SIPeepholeSDWA::matchSDWAOperand` and adding special handling for it in `SDWADstPreserveOperand::convertToSDWA.` See the handling for `V_OR_B32_e64`. Unfortunately, there are no explanations in the source code, but the original patch review for this feature is very informative: https://reviews.llvm.org/D37817.
Have you looked into this approach?
https://github.com/llvm/llvm-project/pull/137137
More information about the llvm-commits
mailing list