[PATCH] D11885: AMDGPU/SI: Add SI Machine Scheduler

Thu Aug 27 10:22:28 PDT 2015

axeldavy added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:134-135
@@ -123,2 +133,4 @@
       return createR600MachineScheduler(C);
+    //else //(uncomment to turn default for SI)
+    //  return createSIMachineScheduler(C);
     return nullptr;
----------------
tstellarAMD wrote:
> Can we add a compile option for this?
the "SISchedRegistry" part adds  the compile option "-misched=si" to use the scheduler.

Uncommenting these two lines just removes the need for this option.

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:753-758
@@ +752,8 @@
+  // Put all high latency instructions in separate blocks
+  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
+    SUnit *SU = &SUnits[i];
+    if (SITII->isHighLatencyInstruction(SU->getInstr())) {
+      Colors[SU->NodeNum] = ++MaxID;
+    }
+  }
+
----------------
tstellarAMD wrote:
> All of the loops like this in this function that iterate over the DAG, should be moved into their own functions (i.e one loop per function).  This will make the code easier to understand and make it easier to enable/disable different passes if necessary.
I think the whole code could be made more modular, have several ways of computing blocks, and several ways to schedule them (and pick the best one).
I plan to do the rewrite.
After a discussion, we decided the code could be merged before the rewrite.

Given the plan to rewrite and make more modular, should I still put these loops in several functions ?
I think having the passes far away from each other in separate functions will make it more complicated.

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:890-915
@@ +889,28 @@
+
+#if 0
+  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
+    SUnit *SU = &SUnits[BottomUpIndex2Node[i]];
+    std::set<unsigned> SUColors;
+
+    // High latency instructions: already given
+    if (Colors[SU->NodeNum] <= maxHighLatencyID && Colors[SU->NodeNum] > 0)
+      continue;
+
+    if (Colors_FirstPass[SU->NodeNum] > 0)
+      continue;
+
+    for (SUnit::const_succ_iterator I = SU->Succs.begin(),
+         E = SU->Succs.end(); I != E; ++I) {
+      SUnit *Succ = I->getSUnit();
+      if (I->isWeak() || Succ->NodeNum >= DAGSize)
+        continue;
+      SUColors.insert(Colors[Succ->NodeNum]);
+    }
+    if (SUColors.size() == 0)
+      continue;
+
+    if (SUColors.size() == 1)
+      Colors[SU->NodeNum] = *SUColors.begin();
+  }
+#endif
+
----------------
tstellarAMD wrote:
> This commented out section should be removed.
I kept it for testing purpose.
On some games it gives a 10% increase in performance, but in general it is a slight decrease.


http://reviews.llvm.org/D11885