[PATCH] D150246: AMDGPU: remove an illegal transform for shl(or)
Ruiling, Song via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue May 9 20:47:06 PDT 2023
ruiling created this revision.
ruiling added reviewers: arsenm, foad.
Herald added subscribers: StephenFan, kerbowa, hiraditya, tpr, dstuttard, yaxunl, jvesely, kzhuravl.
Herald added a project: All.
ruiling requested review of this revision.
Herald added subscribers: llvm-commits, wdng.
Herald added a project: LLVM.
This is a revert of c8903125cd7c7766ef9c8eef3ed3a9d89a6fe554
The code was doing:
`((a | c1) << c2)` ==> `(a << c2) + (c1 << c2)`
This is technically wrong because:
`((a | c1) << c2)` != `(a << c2) + (c1 << c2)`
The right way is to transform the pattern to `(a << c2) | (c1 << c2)`
But the right transformation does not do any help on folding the
constant offset into the memory instructions. On the contrary, it is
increasing number of instructions. So I think the right way is to stop
this wrong transformation in case of multiple uses.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D150246
Files:
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
llvm/test/CodeGen/AMDGPU/shl_add_ptr.ll
Index: llvm/test/CodeGen/AMDGPU/shl_add_ptr.ll
===================================================================
--- llvm/test/CodeGen/AMDGPU/shl_add_ptr.ll
+++ llvm/test/CodeGen/AMDGPU/shl_add_ptr.ll
@@ -410,40 +410,5 @@
ret void
}
-; FIXME: This or should fold into an offset on the write
-; GCN-LABEL: {{^}}shl_or_ptr_combine_2use_lds:
-; GCN: v_lshlrev_b32_e32 [[SCALE0:v[0-9]+]], 3, v0
-; GCN: v_or_b32_e32 [[SCALE1:v[0-9]+]], 32, [[SCALE0]]
-; GCN: v_lshlrev_b32_e32 [[SCALE2:v[0-9]+]], 4, v0
-; GCN: ds_write_b32 [[SCALE1]], v{{[0-9]+}}
-; GCN: ds_write_b32 [[SCALE2]], v{{[0-9]+}} offset:64
-define void @shl_or_ptr_combine_2use_lds(i32 %idx) #0 {
- %idx.add = or i32 %idx, 4
- %shl0 = shl i32 %idx.add, 3
- %shl1 = shl i32 %idx.add, 4
- %ptr0 = inttoptr i32 %shl0 to ptr addrspace(3)
- %ptr1 = inttoptr i32 %shl1 to ptr addrspace(3)
- store volatile i32 9, ptr addrspace(3) %ptr0
- store volatile i32 10, ptr addrspace(3) %ptr1
- ret void
-}
-
-; GCN-LABEL: {{^}}shl_or_ptr_combine_2use_max_lds_offset:
-; GCN-DAG: v_lshlrev_b32_e32 [[SCALE0:v[0-9]+]], 3, v0
-; GCN-DAG: v_lshlrev_b32_e32 [[SCALE1:v[0-9]+]], 4, v0
-; GCN-DAG: ds_write_b32 [[SCALE0]], v{{[0-9]+}} offset:65528
-; GCN-DAG: v_or_b32_e32 [[ADD1:v[0-9]+]], 0x1fff0, [[SCALE1]]
-; GCN: ds_write_b32 [[ADD1]], v{{[0-9]+$}}
-define void @shl_or_ptr_combine_2use_max_lds_offset(i32 %idx) #0 {
- %idx.add = or i32 %idx, 8191
- %shl0 = shl i32 %idx.add, 3
- %shl1 = shl i32 %idx.add, 4
- %ptr0 = inttoptr i32 %shl0 to ptr addrspace(3)
- %ptr1 = inttoptr i32 %shl1 to ptr addrspace(3)
- store volatile i32 9, ptr addrspace(3) %ptr0
- store volatile i32 10, ptr addrspace(3) %ptr1
- ret void
-}
-
attributes #0 = { nounwind }
attributes #1 = { nounwind readnone }
Index: llvm/lib/Target/AMDGPU/SIISelLowering.cpp
===================================================================
--- llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -9572,8 +9572,7 @@
// We only do this to handle cases where it's profitable when there are
// multiple uses of the add, so defer to the standard combine.
- if ((N0.getOpcode() != ISD::ADD && N0.getOpcode() != ISD::OR) ||
- N0->hasOneUse())
+ if ((N0.getOpcode() != ISD::ADD) || N0->hasOneUse())
return SDValue();
const ConstantSDNode *CN1 = dyn_cast<ConstantSDNode>(N1);
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D150246.520897.patch
Type: text/x-patch
Size: 2362 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230510/07e6c45b/attachment.bin>
More information about the llvm-commits
mailing list