[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)

Wed Jun 4 00:04:51 PDT 2025

vg0204 wrote:

After a detailed discussion with Frederik, we came to the conclusion that what I really want to acheive is quite tricky/not feasible to achieve even via breaking it into multiple patterns and integrating it into existing si-peephole-sdwa infrastructure, as its not really a peephole optimization, as it looks for chain of pattern span across various instructions rather than localised pattern with very few instruction. 

So, considering the optimization it does in terms of elimination of v_pack(generated naturally by ISEL for fp16 vectors), and reducing register usage by extensive of UNUSED_PRESERVE, we can write this as a new pass (invoked immediately after si-peephole-sdwa pass) and need some SDWAutility from this existing pass that can be extracted out (in discussion with frederik on how to do accurately considering some needed utility function for my logic, has pre-assumptions based on si-peephole pass, that need generalization)

What do you guys think @arsenm , @jayfoad , @krzysz00 ?

https://github.com/llvm/llvm-project/pull/137137