[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)

Mon May 26 23:53:03 PDT 2025

vg0204 wrote:

> I would have expected that this can be handled by creating a `SDWADstPreserveOperand ` for the `V_PACK_B32_F16_e64` instruction in `SIPeepholeSDWA::matchSDWAOperand` and adding special handling for it in `SDWADstPreserveOperand::convertToSDWA.` See the handling for `V_OR_B32_e64`. Unfortunately, there are no explanations in the source code, but the original patch review for this feature is very informative: https://reviews.llvm.org/D37817.
> 
> Have you looked into this approach?

I looked into it now! And as you said it can be handled using `SDWADstPreserveOperand`, but if you look into https://ontrack-internal.amd.com/browse/SWDEV-523024 (specifically the attached images), you will understand I tried to handle something which is one step-ahead (maybe computationally expensive, need feedback on that). Also, do look into updated mir tests for example along the same line! 

https://github.com/llvm/llvm-project/pull/137137