[llvm] [AMDGPU] Eliminate unnecessary packing in wider f16 vectors for sdwa/opsel-able instruction (PR #137137)
Vikash Gupta via llvm-commits
llvm-commits at lists.llvm.org
Tue May 13 02:28:18 PDT 2025
vg0204 wrote:
> It would be easier to solve this by avoiding introducing that problem in the first place. Whether that's by checking if the uses are SDWA candidates, or just not using v_pack_b32_f16 in the first place. It would be simpler to form v_pack_b32_f16 later than trying to look through it while also mixing in conversion to SDWA
This problem arises right away from isel phase, as fot unary ops like log or exp, we don't have dedicated packed instruction, so isel always scalarizes & generate separate element-wise instruction, followed by using different strategy to pack them. In our case for v_pack_b32 _f16 (GFX8- to GFX9+). So, checking for SDWA candidates at such early stage OR preventing it to happen at isel phase seems lot of work.
@jayfoad, you could add to it if anything I missed, or unaware about!
https://github.com/llvm/llvm-project/pull/137137
More information about the llvm-commits
mailing list