[llvm] [AMDGPU] add s_bitset[10]_b32 optimization for shl+[or, andn2] pattern (PR #134155)

Sun Apr 6 10:00:11 PDT 2025

BaoshanPang wrote:

> > Do you mean should I do it in td file?
> 
> Yes. This shouldn't require anything special that would require manual selection, or post-select folding (as you have here)

@arsenm 
With some debugging, I feel maybe it is better to do it in post-select. because:
1. to do it in fd file, we need to use different method for global-isel and non-global-isel.
2. in selection stage, the instructions are more generic and complex than what can be seen at post-select phase:

# *** IR Dump After AMDGPUPreLegalizerCombiner (amdgpu-prelegalizer-combiner) ***:
# Machine code for function s_bitset1_b32: IsSSA, TracksLiveness
Function Live Ins: $sgpr2 in %2

```
bb.1 (%ir-block.0):
  liveins: $sgpr0, $sgpr1
  %0:_(s32) = COPY $sgpr0
  %1:_(s32) = COPY $sgpr1
  %3:_(s32) = G_CONSTANT i32 1
  %4:_(s32) = G_SHL %3:_, %1:_(s32)
  %5:_(s32) = G_OR %0:_, %4:_
  %6:_(s32) = G_INTRINSIC_CONVERGENT intrinsic(@llvm.amdgcn.readfirstlane), %5:_(s32)
  $sgpr0 = COPY %6:_(s32)
  SI_RETURN_TO_EPILOG implicit $sgpr0

# End machine code for function s_bitset1_b32.
```

also in later phase like RegBankSelect, some sgpr would be converted to vgpr which inrease the complexity:

```
# *** IR Dump After RegBankSelect (regbankselect) ***:
# Machine code for function s_bitset1_b32: IsSSA, TracksLiveness, Legalized, RegBankSelected
Function Live Ins: $sgpr2 in %2

bb.1 (%ir-block.0):
  liveins: $sgpr0, $sgpr1
  %0:sgpr(s32) = COPY $sgpr0
  %1:sgpr(s32) = COPY $sgpr1
  %3:sgpr(s32) = G_CONSTANT i32 1
  %4:sgpr(s32) = G_SHL %3:sgpr, %1:sgpr(s32)
  %5:sgpr(s32) = G_OR %0:sgpr, %4:sgpr
  %7:vgpr(s32) = COPY %5:sgpr(s32)
  %6:sgpr(s32) = G_INTRINSIC_CONVERGENT intrinsic(@llvm.amdgcn.readfirstlane), %7:vgpr(s32)
  $sgpr0 = COPY %6:sgpr(s32)
  SI_RETURN_TO_EPILOG implicit $sgpr0

# End machine code for function s_bitset1_b32.

```

While if do it in post-select phase, these would be what we need to handle wich is much simple and straghtforward:

```
# *** IR Dump After SI Shrink Instructions (si-shrink-instructions) ***:
# Machine code for function s_bitset1_b32: NoPHIs, TracksLiveness, NoVRegs, Legalized, RegBankSelected, Selected, TiedOpsRewritten, TracksDebugUserValues
Function Live Ins: $sgpr2

bb.0 (%ir-block.0):
  liveins: $sgpr0, $sgpr1
  renamable $sgpr1 = S_LSHL_B32 1, killed renamable $sgpr1, implicit-def dead $scc
  renamable $sgpr0 = S_OR_B32 killed renamable $sgpr0, killed renamable $sgpr1, implicit-def dead $scc
  SI_RETURN_TO_EPILOG implicit $sgpr0

# End machine code for function s_bitset1_b32.

```

https://github.com/llvm/llvm-project/pull/134155