[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)

Sun Sep 21 23:34:19 PDT 2025

vg0204 wrote:

> (I'm personally confused about why this can't be done by legalizing operations on, say, <4 x half> to the correct sequence of SDWA/OPSEL operations, but maybe there are limitations to this approach or it requires too much special-casing?)

Its not about it can be done or not, it's more of which is more apt in terms of long-term maintenance & amount of effort needed as compared to the performance boost! @jayfoad might help me on this point! You can see his comment on this ticket as well https://ontrack-internal.amd.com/browse/SWDEV-523024

https://github.com/llvm/llvm-project/pull/137137