[PATCH] D116529: [GlobalISel] Fold or of shifts with constant amount to funnel shift.

Thu Jan 20 04:54:54 PST 2022

foad added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll:2219
+; GFX6-NEXT:    v_mov_b32_e32 v2, s4
+; GFX6-NEXT:    v_alignbit_b32 v0, s1, v0, 16
+; GFX6-NEXT:    v_alignbit_b32 v1, s3, v1, 16
----------------
abinavpp wrote:
> arsenm wrote:
> > foad wrote:
> > > Maybe not your fault, but it's a bad idea to use a VALU instruction for uniform values, especially if it means we need to insert readfirstlanes.
> > should probably do this in the post-regbank combiner 
> We could maintain this generic combine and an AMDGPU specific post
> regbank-select version that bails out on an SGPR destination by reusing the
> match code. We'll need to exclude the generic combine until regbank-select in
> AMDGPUCombine.td.
> 
> More importantly, is this worth the effort? The constant shift amt pattern
> looks bad for uniform, but the original pattern:
> ```
> define amdgpu_kernel void @fshr_v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %amt, <4 x i32> addrspace(1)* %m) {
>   %sub = sub <4 x i32> <i32 32, i32 32, i32 32, i32 32>, %amt
>   %shl = shl <4 x i32> %a, %sub
>   %lshr = lshr <4 x i32> %b, %amt
>   %ret = or <4 x i32> %shl, %lshr
>   store <4 x i32> %ret, <4 x i32> addrspace(1)* %m
>   ret void
> }
> ```
> has lesser instructions with the combine.
> 
> How should we move forward?
Can't we just do what this comment in RegBankSelect says:
```
  case AMDGPU::G_FSHR: // TODO: Expand for scalar
```
maybe expanding it to S_LSHR_B64 and just taking the low part of the result?

In any case I don't think this needs to block the current patch.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116529/new/

https://reviews.llvm.org/D116529