[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)

Fri Jul 18 11:31:27 PDT 2025

krzysz00 wrote:

> You mean at the machine instruction selection phase for the given DAG of vector <4 x [i//f]16> or the like!

Yeah. It'd be nice to declare <4 x [i/f]16> versions of SDWA operations legal and then lower them to the version that doesn't need to do any packing

https://github.com/llvm/llvm-project/pull/137137