[PATCH] D116529: [GlobalISel] Fold or of shifts with constant amount to funnel shift.

Thu Jan 20 04:25:00 PST 2022

abinavpp marked 2 inline comments as done.
abinavpp added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll:2219
+; GFX6-NEXT:    v_mov_b32_e32 v2, s4
+; GFX6-NEXT:    v_alignbit_b32 v0, s1, v0, 16
+; GFX6-NEXT:    v_alignbit_b32 v1, s3, v1, 16
----------------
arsenm wrote:
> foad wrote:
> > Maybe not your fault, but it's a bad idea to use a VALU instruction for uniform values, especially if it means we need to insert readfirstlanes.
> should probably do this in the post-regbank combiner 
We could maintain this generic combine and an AMDGPU specific post
regbank-select version that bails out on an SGPR destination by reusing the
match code. We'll need to exclude the generic combine until regbank-select in
AMDGPUCombine.td.

More importantly, is this worth the effort? The constant shift amt pattern
looks bad for uniform, but the original pattern:
```
define amdgpu_kernel void @fshr_v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %amt, <4 x i32> addrspace(1)* %m) {
  %sub = sub <4 x i32> <i32 32, i32 32, i32 32, i32 32>, %amt
  %shl = shl <4 x i32> %a, %sub
  %lshr = lshr <4 x i32> %b, %amt
  %ret = or <4 x i32> %shl, %lshr
  store <4 x i32> %ret, <4 x i32> addrspace(1)* %m
  ret void
}
```
has lesser instructions with the combine.

How should we move forward?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116529/new/

https://reviews.llvm.org/D116529