[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)

Tue Jul 22 02:46:52 PDT 2025

vg0204 wrote:

> > You mean at the machine instruction selection phase for the given DAG of vector <4 x [i//f]16> or the like!
> 
> Yeah. It'd be nice to declare <4 x [i/f]16> versions of SDWA operations legal and then lower them to the version that doesn't need to do any packing

Considering such an target-specific as well as subtarget-specfic at an early stage would be bit tricky! Also what do we want to achieve is quiet a very specific optimixation, is it worth to define new stuff at ISEL level for that. I am not sure about it really!  
@jayfoad , @frederik-h What are your thoughts on it?

https://github.com/llvm/llvm-project/pull/137137