[llvm] [AMDGPU] Add pattern to select scalar ops for fshr with uniform operands (PR #165295)
Akash Dutta via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 5 08:12:24 PST 2025
akadutta wrote:
> > limit pattern to gfx9+
>
> Why?
For other architectures, it leads to less optimized codegen and increased VGPR usage. For example:
Looking at the ASM for the amdgcn.bitcast.32bit.ll test for gfx600:
With change:
```
v_and_b32_e32 v1, 0xffff0000, v4
v_and_b32_e32 v0, 0xffff0000, v2
v_add_f32_e32 v1, 0x40c00000, v1
v_add_f32_e32 v0, 0x40c00000, v0
v_lshrrev_b32_e32 v1, 16, v1
v_lshr_b64 v[0:1], v[0:1], 16
```
Without change:
```
v_and_b32_e32 v1, 0xffff0000, v1
v_and_b32_e32 v0, 0xffff0000, v2
v_add_f32_e32 v1, 0x40c00000, v1
v_add_f32_e32 v0, 0x40c00000, v0
v_lshrrev_b32_e32 v1, 16, v1
v_alignbit_b32 v0, v1, v0, 16
```
Semantically, both are the same. However, we end up using a 64 bit instruction and more VGPR (5 vs 3). I don't see the same behavior for GFX9, 10, 11, and 12. Hence, limiting the blast radius to GFX9+.
https://github.com/llvm/llvm-project/pull/165295
More information about the llvm-commits
mailing list