[llvm] [AMDGPU] add s_bitset[10]_b32 optimization for shl+[or, andn2] pattern (PR #134155)
via llvm-commits
llvm-commits at lists.llvm.org
Sun Apr 6 10:00:11 PDT 2025
BaoshanPang wrote:
> > Do you mean should I do it in td file?
>
> Yes. This shouldn't require anything special that would require manual selection, or post-select folding (as you have here)
@arsenm
With some debugging, I feel maybe it is better to do it in post-select. because:
1. to do it in fd file, we need to use different method for global-isel and non-global-isel.
2. in selection stage, the instructions are more generic and complex than what can be seen at post-select phase:
# *** IR Dump After AMDGPUPreLegalizerCombiner (amdgpu-prelegalizer-combiner) ***:
# Machine code for function s_bitset1_b32: IsSSA, TracksLiveness
Function Live Ins: $sgpr2 in %2
```
bb.1 (%ir-block.0):
liveins: $sgpr0, $sgpr1
%0:_(s32) = COPY $sgpr0
%1:_(s32) = COPY $sgpr1
%3:_(s32) = G_CONSTANT i32 1
%4:_(s32) = G_SHL %3:_, %1:_(s32)
%5:_(s32) = G_OR %0:_, %4:_
%6:_(s32) = G_INTRINSIC_CONVERGENT intrinsic(@llvm.amdgcn.readfirstlane), %5:_(s32)
$sgpr0 = COPY %6:_(s32)
SI_RETURN_TO_EPILOG implicit $sgpr0
# End machine code for function s_bitset1_b32.
```
also in later phase like RegBankSelect, some sgpr would be converted to vgpr which inrease the complexity:
```
# *** IR Dump After RegBankSelect (regbankselect) ***:
# Machine code for function s_bitset1_b32: IsSSA, TracksLiveness, Legalized, RegBankSelected
Function Live Ins: $sgpr2 in %2
bb.1 (%ir-block.0):
liveins: $sgpr0, $sgpr1
%0:sgpr(s32) = COPY $sgpr0
%1:sgpr(s32) = COPY $sgpr1
%3:sgpr(s32) = G_CONSTANT i32 1
%4:sgpr(s32) = G_SHL %3:sgpr, %1:sgpr(s32)
%5:sgpr(s32) = G_OR %0:sgpr, %4:sgpr
%7:vgpr(s32) = COPY %5:sgpr(s32)
%6:sgpr(s32) = G_INTRINSIC_CONVERGENT intrinsic(@llvm.amdgcn.readfirstlane), %7:vgpr(s32)
$sgpr0 = COPY %6:sgpr(s32)
SI_RETURN_TO_EPILOG implicit $sgpr0
# End machine code for function s_bitset1_b32.
```
While if do it in post-select phase, these would be what we need to handle wich is much simple and straghtforward:
```
# *** IR Dump After SI Shrink Instructions (si-shrink-instructions) ***:
# Machine code for function s_bitset1_b32: NoPHIs, TracksLiveness, NoVRegs, Legalized, RegBankSelected, Selected, TiedOpsRewritten, TracksDebugUserValues
Function Live Ins: $sgpr2
bb.0 (%ir-block.0):
liveins: $sgpr0, $sgpr1
renamable $sgpr1 = S_LSHL_B32 1, killed renamable $sgpr1, implicit-def dead $scc
renamable $sgpr0 = S_OR_B32 killed renamable $sgpr0, killed renamable $sgpr1, implicit-def dead $scc
SI_RETURN_TO_EPILOG implicit $sgpr0
# End machine code for function s_bitset1_b32.
```
https://github.com/llvm/llvm-project/pull/134155
More information about the llvm-commits
mailing list