[llvm] [AMDGPU][GlobalISel] Expand SGPR S1 exts into G_SELECT (PR #68858)

Pierre van Houtryve via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 16 07:26:57 PDT 2023


Pierre-vh wrote:

> > > > It also adds some new ISel logic to make G_SELECT 1,0 be lowered to a simple SCC copy. This is because a copy of SCC is already a s_cselect x, 1, 0. Without that we get some regressions.
> > > 
> > > 
> > > I don't understand why this part is required. Do you have a simple example?
> > 
> > 
> > `G_SELECT` is lowered into something like
> > ```
> >   $scc = COPY %30:sreg_32
> >   %18:sreg_32 = S_CSELECT_B32 %35:sreg_32, %34:sreg_32, implicit $scc
> > ```
> > 
> > 
> > But ` $scc = COPY %30:sreg_32` is expanded post RA into a, for example, `s_cselect s0, 1, 0` already. So we end up with two selects that are redundant.
> > In most cases it doesn't really matter because it seems that we're able to optimize it anyway, but in the case of uaddo, we get something like this without the ISel change: ![MicrosoftTeams-image](https://user-images.githubusercontent.com/29600849/274589587-e4173be8-c353-416b-b95e-e314df170926.png)
> 
> I wonder if it would be better to avoid creating this "select cond, 1, 0" pattern in the first place? Looking at a case like s_uaddo_i32 from test/CodeGen/AMDGPU/uaddo.ll, before regbankselect we have:
> 
> ```
>   %3:_(s32), %4:_(s1) = G_UADDO %0:_, %1:_
>   %5:_(s32) = G_ZEXT %4:_(s1)
> ```
> 
> Then regbankselect extends the carry-out from G_UADDO from s1 to s32 and we get this:
> 
> ```
>   %3:sgpr(s32), %8:sgpr(s32) = G_UADDO %0:sgpr, %1:sgpr
>   %4:sgpr(s1) = G_TRUNC %8:sgpr(s32)
>   %9:sgpr(s32) = G_CONSTANT i32 0
>   %10:sgpr(s32) = G_CONSTANT i32 1
>   %11:sgpr(s32) = G_ZEXT %4:sgpr(s1)
>   %5:sgpr(s32) = G_SELECT %11:sgpr(s32), %10:sgpr, %9:sgpr
> ```
> 
> This already looks too complicated. Why can't it just be:
> 
> ```
>   %3:sgpr(s32), %8:sgpr(s32) = G_UADDO %0:sgpr, %1:sgpr
>   %4:sgpr(s1) = G_TRUNC %8:sgpr(s32)
>   %11:sgpr(s32) = G_ZEXT %4:sgpr(s1)
> ```
> 
> Then, with some knownbits analysis, we could have a post-regbankselect combine to remove the G_TRUNC/G_ZEXT pair.

An alternative to the ISel change is to add a (generic?) combine that folds `(select %cond, 1, 0) -> %cond` when `%cond` is known to be only one bit in value (could also do `(select %cond, 0, 1) -> (xor %cond, 1)` I think). That way we still avoid s1 truncs to make it past RBSelect (which is the goal of this patch), and we can potentially fold even more selects. Do you like that approach more than the ISel change?

https://github.com/llvm/llvm-project/pull/68858


More information about the llvm-commits mailing list