[PATCH] D116529: [GlobalISel] Fold or of shifts with constant amount to funnel shift.
Abinav Puthan Purayil via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 20 04:25:00 PST 2022
abinavpp marked 2 inline comments as done.
abinavpp added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll:2219
+; GFX6-NEXT: v_mov_b32_e32 v2, s4
+; GFX6-NEXT: v_alignbit_b32 v0, s1, v0, 16
+; GFX6-NEXT: v_alignbit_b32 v1, s3, v1, 16
----------------
arsenm wrote:
> foad wrote:
> > Maybe not your fault, but it's a bad idea to use a VALU instruction for uniform values, especially if it means we need to insert readfirstlanes.
> should probably do this in the post-regbank combiner
We could maintain this generic combine and an AMDGPU specific post
regbank-select version that bails out on an SGPR destination by reusing the
match code. We'll need to exclude the generic combine until regbank-select in
AMDGPUCombine.td.
More importantly, is this worth the effort? The constant shift amt pattern
looks bad for uniform, but the original pattern:
```
define amdgpu_kernel void @fshr_v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %amt, <4 x i32> addrspace(1)* %m) {
%sub = sub <4 x i32> <i32 32, i32 32, i32 32, i32 32>, %amt
%shl = shl <4 x i32> %a, %sub
%lshr = lshr <4 x i32> %b, %amt
%ret = or <4 x i32> %shl, %lshr
store <4 x i32> %ret, <4 x i32> addrspace(1)* %m
ret void
}
```
has lesser instructions with the combine.
How should we move forward?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D116529/new/
https://reviews.llvm.org/D116529
More information about the llvm-commits
mailing list