[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)

Tue May 13 02:28:18 PDT 2025

vg0204 wrote:

> It would be easier to solve this by avoiding introducing that problem in the first place. Whether that's by checking if the uses are SDWA candidates, or just not using v_pack_b32_f16 in the first place. It would be simpler to form v_pack_b32_f16 later than trying to look through it while also mixing in conversion to SDWA

This problem arises right away from isel phase, as fot unary ops like log or exp, we don't have dedicated packed instruction, so isel always scalarizes & generate separate element-wise instruction, followed by using different strategy to pack them. In our case for v_pack_b32 _f16 (GFX8- to GFX9+). So, checking for SDWA candidates at such early stage OR preventing it to happen at isel phase seems lot of work.

@jayfoad, you could add to it if anything I missed, or unaware about!

https://github.com/llvm/llvm-project/pull/137137