[llvm] [AMDGPU] Optimize rotate instruction selection patterns (PR #143551)

Thu Jun 19 02:11:21 PDT 2025

jayfoad wrote:

> This patch improves rotate instruction selection for AMDGPU by adding optimized patterns for the rotate right (rotr) operation. It now selects s_lshl + s_lshr + s_or (3 SALU instructions) instead of the previous v_alignbit + v_readfirstlane (2 VALU instructions).

Thanks for working on this. It would be great to avoid selecting v_alignbit_b32 for uniform inputs, but it's hard to do this without causing some regressions.

> It now selects s_lshl + s_lshr + s_or (3 SALU instructions) instead of the previous v_alignbit + v_readfirstlane (2 VALU instructions).

In general there will also be s_sub (so 4 SALU instructions in total). In some cases the v_readfirstlane was not required, if the result was wanted in a VGPR anyway (so it was only 1 VALU instruction) - this is pretty common in our test suite but much less common in real world code.

Whatever we end up doing for rotates, we should do the same for funnel shifts.

https://github.com/llvm/llvm-project/pull/143551