[PATCH] D116529: [GlobalISel] Fold or of shifts with constant amount to funnel shift.
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 20 04:54:54 PST 2022
foad added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll:2219
+; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: v_alignbit_b32 v0, s1, v0, 16
+; GFX6-NEXT: v_alignbit_b32 v1, s3, v1, 16
----------------
abinavpp wrote:
> arsenm wrote:
> > foad wrote:
> > > Maybe not your fault, but it's a bad idea to use a VALU instruction for uniform values, especially if it means we need to insert readfirstlanes.
> > should probably do this in the post-regbank combiner
> We could maintain this generic combine and an AMDGPU specific post
> regbank-select version that bails out on an SGPR destination by reusing the
> match code. We'll need to exclude the generic combine until regbank-select in
> AMDGPUCombine.td.
>
> More importantly, is this worth the effort? The constant shift amt pattern
> looks bad for uniform, but the original pattern:
> ```
> define amdgpu_kernel void @fshr_v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %amt, <4 x i32> addrspace(1)* %m) {
> %sub = sub <4 x i32> <i32 32, i32 32, i32 32, i32 32>, %amt
> %shl = shl <4 x i32> %a, %sub
> %lshr = lshr <4 x i32> %b, %amt
> %ret = or <4 x i32> %shl, %lshr
> store <4 x i32> %ret, <4 x i32> addrspace(1)* %m
> ret void
> }
> ```
> has lesser instructions with the combine.
>
> How should we move forward?
Can't we just do what this comment in RegBankSelect says:
```
case AMDGPU::G_FSHR: // TODO: Expand for scalar
```
maybe expanding it to S_LSHR_B64 and just taking the low part of the result?
In any case I don't think this needs to block the current patch.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D116529/new/
https://reviews.llvm.org/D116529
More information about the llvm-commits
mailing list