[PATCH] D73292: [AMDGPU] Correct NumLoads in clustering

Fri Jan 24 12:07:41 PST 2020

rampitec updated this revision to Diff 240263.
rampitec edited the summary of this revision.
rampitec added a reviewer: t.p.northover.
rampitec added a comment.
Herald added subscribers: javed.absar, MatzeB.

Changed patch to only correct clusterNeighboringMemOps. Threshold logic adjusted accordingly.
We will need to retune our threshold in the AMDGPU separately, likely after scheduler will gain the ability to break clustering under a high pressure.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73292/new/

https://reviews.llvm.org/D73292

Files:
  llvm/lib/CodeGen/MachineScheduler.cpp
  llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
  llvm/lib/Target/AMDGPU/SIInstrInfo.cpp


Index: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
===================================================================

--- llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -457,7 +457,7 @@
   if ((isMUBUF(FirstLdSt) && isMUBUF(SecondLdSt)) ||
       (isMTBUF(FirstLdSt) && isMTBUF(SecondLdSt)) ||
       (isFLAT(FirstLdSt) && isFLAT(SecondLdSt))) {
-    const unsigned MaxGlobalLoadCluster = 6;
+    const unsigned MaxGlobalLoadCluster = 7;
     if (NumLoads > MaxGlobalLoadCluster)
       return false;
 
@@ -497,7 +497,11 @@
                                          ? MRI.getRegClass(Reg)
                                          : RI.getPhysRegClass(Reg);
 
-  return (NumLoads * (RI.getRegSizeInBits(*DstRC) / 8)) <= LoadClusterThreshold;
+  // FIXME: NumLoads should not be subtracted 1. This is to match behavior
+  // of clusterNeighboringMemOps which was previosly passing cluster length
+  // less 1. LoadClusterThreshold should be tuned instead.
+  return ((NumLoads - 1) * (RI.getRegSizeInBits(*DstRC) / 8)) <=
+         LoadClusterThreshold;
 }
 
 // FIXME: This behaves strangely. If, for example, you have 32 load + stores,
Index: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
===================================================================
--- llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -2422,7 +2422,7 @@
     return false;
 
   // Only cluster up to a single pair.
-  if (NumLoads > 1)
+  if (NumLoads > 2)
     return false;
 
   if (!isPairableLdStInst(FirstLdSt) || !isPairableLdStInst(SecondLdSt))
Index: llvm/lib/CodeGen/MachineScheduler.cpp
===================================================================
--- llvm/lib/CodeGen/MachineScheduler.cpp
+++ llvm/lib/CodeGen/MachineScheduler.cpp
@@ -1584,7 +1584,7 @@
     SUnit *SUb = MemOpRecords[Idx+1].SU;
     if (TII->shouldClusterMemOps(MemOpRecords[Idx].BaseOps,
                                  MemOpRecords[Idx + 1].BaseOps,
-                                 ClusterLength)) {
+                                 ClusterLength + 1)) {
       if (SUa->NodeNum > SUb->NodeNum)
         std::swap(SUa, SUb);
       if (DAG->addEdge(SUb, SDep(SUa, SDep::Cluster))) {


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D73292.240263.patch
Type: text/x-patch
Size: 2232 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200124/78a73a94/attachment-0001.bin>