[PATCH] D120370: [AMDGPU] Fix combined MMO in load-store merge

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 22 16:39:27 PST 2022


rampitec created this revision.
rampitec added a reviewer: foad.
Herald added subscribers: kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, nhaehnle, jvesely, kzhuravl, arsenm.
rampitec requested review of this revision.
Herald added a subscriber: wdng.
Herald added a project: LLVM.

Loads and stores can be out of order in the SILoadStoreOptimizer.
When combining MachineMemOperands of two instructions operands are
sent in the IR order into the combineKnownAdjacentMMOs. At the
moment it picks the first operand and just replaces its offset and
size. This essentially loses alignment information and may generally
result in an incorrect base pointer to be used.

Use a base pointer in memory addresses order instead and only adjust
size.


https://reviews.llvm.org/D120370

Files:
  llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
  llvm/test/CodeGen/AMDGPU/merge-global-load-store.mir


Index: llvm/test/CodeGen/AMDGPU/merge-global-load-store.mir
===================================================================
--- llvm/test/CodeGen/AMDGPU/merge-global-load-store.mir
+++ llvm/test/CodeGen/AMDGPU/merge-global-load-store.mir
@@ -405,7 +405,7 @@
 
     ; GCN-LABEL: name: merge_global_load_dword_2_out_of_order
     ; GCN: [[DEF:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
-    ; GCN-NEXT: [[GLOBAL_LOAD_DWORDX2_:%[0-9]+]]:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 [[DEF]], 0, 0, implicit $exec :: (load (s64) from `i32 addrspace(1)* undef`, addrspace 1)
+    ; GCN-NEXT: [[GLOBAL_LOAD_DWORDX2_:%[0-9]+]]:vreg_64_align2 = GLOBAL_LOAD_DWORDX2 [[DEF]], 0, 0, implicit $exec :: (load (s64) from `float addrspace(1)* undef`, align 4, addrspace 1)
     ; GCN-NEXT: [[COPY:%[0-9]+]]:vgpr_32 = COPY [[GLOBAL_LOAD_DWORDX2_]].sub1
     ; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY killed [[GLOBAL_LOAD_DWORDX2_]].sub0
     ; GCN-NEXT: S_NOP 0, implicit [[COPY]], implicit [[COPY1]]
@@ -422,7 +422,7 @@
 
     ; GCN-LABEL: name: merge_global_load_dword_3_out_of_order
     ; GCN: [[DEF:%[0-9]+]]:vreg_64_align2 = IMPLICIT_DEF
-    ; GCN-NEXT: [[GLOBAL_LOAD_DWORDX3_:%[0-9]+]]:vreg_96_align2 = GLOBAL_LOAD_DWORDX3 [[DEF]], 0, 0, implicit $exec :: (load (s96) from `i32 addrspace(1)* undef`, align 4, addrspace 1)
+    ; GCN-NEXT: [[GLOBAL_LOAD_DWORDX3_:%[0-9]+]]:vreg_96_align2 = GLOBAL_LOAD_DWORDX3 [[DEF]], 0, 0, implicit $exec :: (load (s96) from `float addrspace(1)* undef`, align 16, addrspace 1)
     ; GCN-NEXT: [[COPY:%[0-9]+]]:vreg_64_align2 = COPY [[GLOBAL_LOAD_DWORDX3_]].sub0_sub1
     ; GCN-NEXT: [[COPY1:%[0-9]+]]:vgpr_32 = COPY killed [[GLOBAL_LOAD_DWORDX3_]].sub2
     ; GCN-NEXT: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[COPY]].sub1
Index: llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
===================================================================
--- llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
+++ llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
@@ -662,19 +662,15 @@
   return true;
 }
 
-// This function assumes that \p A and \p B have are identical except for
+// This function assumes that \p A and \p B are identical except for
 // size and offset, and they reference adjacent memory.
 static MachineMemOperand *combineKnownAdjacentMMOs(MachineFunction &MF,
                                                    const MachineMemOperand *A,
                                                    const MachineMemOperand *B) {
-  unsigned MinOffset = std::min(A->getOffset(), B->getOffset());
   unsigned Size = A->getSize() + B->getSize();
-  // This function adds the offset parameter to the existing offset for A,
-  // so we pass 0 here as the offset and then manually set it to the correct
-  // value after the call.
-  MachineMemOperand *MMO = MF.getMachineMemOperand(A, 0, Size);
-  MMO->setOffset(MinOffset);
-  return MMO;
+  if (A->getOffset() > B->getOffset())
+    A = B;
+  return MF.getMachineMemOperand(A, A->getPointerInfo(), Size);
 }
 
 bool SILoadStoreOptimizer::dmasksCanBeCombined(const CombineInfo &CI,


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D120370.410675.patch
Type: text/x-patch
Size: 3046 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220223/9dfd3171/attachment.bin>


More information about the llvm-commits mailing list