[PATCH] D11885: AMDGPU/SI: Add SI Machine Scheduler

Tom Stellard via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 21 13:13:32 PDT 2015


tstellarAMD added a comment.

I have given a few inline comments.  This is a little hard to review, because I'm not really familiar with the scheduling infrastructure.  My main concern is that the scheduler is too complex.  Although maybe that's just the nature of schedulers.

It would be nice if the code could be organized such that there was a simple core scheduler, with several modular improvements that could be added or removed easily.  Do you think this could be done?

The only requirement of the core scheduler should be that it produces consistent code, it doesn't necessarily have to be better than the current scheduler.  One thing that is really bad about the current scheduler is how removing just one instruction will drastically change the structure of the program as well as the register usage.  Having a consistent scheduler (even one that is worse) will make it easier to measure performance improvements of other optimizations.


================
Comment at: lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:134-135
@@ -123,2 +133,4 @@
       return createR600MachineScheduler(C);
+    //else //(uncomment to turn default for SI)
+    //  return createSIMachineScheduler(C);
     return nullptr;
----------------
Can we add a compile option for this?

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:640
@@ +639,3 @@
+// Both TopDown and BottomUp
+void SIScheduleDAGMI::topologicalSort() {
+  unsigned DAGSize = SUnits.size();
----------------
This function seems to have parts that where copied from ScheduleDAGTopologicalSort::InitDAGTopologicalSorting() in ScheduleDAG.cpp.  Is there some reason why we can't just re-use that function?

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:753-758
@@ +752,8 @@
+  // Put all high latency instructions in separate blocks
+  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
+    SUnit *SU = &SUnits[i];
+    if (SITII->isHighLatencyInstruction(SU->getInstr())) {
+      Colors[SU->NodeNum] = ++MaxID;
+    }
+  }
+
----------------
All of the loops like this in this function that iterate over the DAG, should be moved into their own functions (i.e one loop per function).  This will make the code easier to understand and make it easier to enable/disable different passes if necessary.

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:890-915
@@ +889,28 @@
+
+#if 0
+  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
+    SUnit *SU = &SUnits[BottomUpIndex2Node[i]];
+    std::set<unsigned> SUColors;
+
+    // High latency instructions: already given
+    if (Colors[SU->NodeNum] <= maxHighLatencyID && Colors[SU->NodeNum] > 0)
+      continue;
+
+    if (Colors_FirstPass[SU->NodeNum] > 0)
+      continue;
+
+    for (SUnit::const_succ_iterator I = SU->Succs.begin(),
+         E = SU->Succs.end(); I != E; ++I) {
+      SUnit *Succ = I->getSUnit();
+      if (I->isWeak() || Succ->NodeNum >= DAGSize)
+        continue;
+      SUColors.insert(Colors[Succ->NodeNum]);
+    }
+    if (SUColors.size() == 0)
+      continue;
+
+    if (SUColors.size() == 1)
+      Colors[SU->NodeNum] = *SUColors.begin();
+  }
+#endif
+
----------------
This commented out section should be removed.

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:917-944
@@ +916,30 @@
+
+  // lighter pass. Only merges constant loading
+  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
+    SUnit *SU = &SUnits[BottomUpIndex2Node[i]];
+    std::set<unsigned> SUColors;
+
+    // High latency instructions: already given
+    if (Colors[SU->NodeNum] <= MaxHighLatencyID && Colors[SU->NodeNum] > 0)
+      continue;
+
+    /* No predecessor: vgpr constant loading */
+    /* low latency instruction usually have a predecessor (the address) */
+    if (Colors_FirstPass[SU->NodeNum] > 0 ||
+        (SU->Preds.size() > 0 &&
+         !SITII->isLowLatencyInstruction(SU->getInstr())))
+      continue;
+
+    for (SUnit::const_succ_iterator I = SU->Succs.begin(),
+         E = SU->Succs.end(); I != E; ++I) {
+      SUnit *Succ = I->getSUnit();
+      if (I->isWeak() || Succ->NodeNum >= DAGSize)
+        continue;
+      SUColors.insert(Colors[Succ->NodeNum]);
+    }
+    if (SUColors.size() == 0)
+      continue;
+    if (SUColors.size() == 1)
+      Colors[SU->NodeNum] = *SUColors.begin();
+  }
+
----------------
Can you move this loop into its own function.  This will make it easier to understand what the scheduler is doing at a high level.

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:955-977
@@ +954,25 @@
+
+  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
+    SUnit *SU = &SUnits[BottomUpIndex2Node[i]];
+    std::set<unsigned> SUColors;
+
+    // High latency instructions: already given
+    if (Colors[SU->NodeNum] <= MaxHighLatencyID && Colors[SU->NodeNum] > 0)
+      continue;
+
+    for (SUnit::const_succ_iterator I = SU->Succs.begin(),
+         E = SU->Succs.end(); I != E; ++I) {
+      SUnit *Succ = I->getSUnit();
+      if (I->isWeak() || Succ->NodeNum >= DAGSize)
+        continue;
+      if (Colors[Succ->NodeNum] > 0)
+        SUColors.insert(Colors[Succ->NodeNum]);
+    }
+    // keep previous color
+    if (SUColors.size() == 0)
+      continue;
+    if (SUColors.size() == 1 && *SUColors.begin() <= MaxHighLatencyID &&
+        *SUColors.begin() > 0)
+      Colors[SU->NodeNum] = *SUColors.begin();
+  }
+
----------------
Can you move this loop to its own function too.

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:985-1010
@@ +984,28 @@
+  // (except for high latencies):
+  /*for (unsigned i = 0, e = DAGSize; i != e; ++i) {
+    SUnit *SU = &SUnits[TopDownIndex2Node[i]];
+
+    // High latency instructions
+    if (Colors[SU->NodeNum] <= maxHighLatencyID && Colors[SU->NodeNum] > 0)
+      continue;
+
+    Colors[SU->NodeNum] = ++maxID;
+  }*/
+
+  // Put SUs of same color into same block
+  RealID.resize(MaxID+1, -1);
+  Node2Block.resize(DAGSize, -1);
+  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
+    SUnit *SU = &SUnits[i];
+    unsigned Color = Colors[SU->NodeNum];
+    if (RealID[Color] == -1) {
+      int ID = Blocks.size();
+      Blocks.push_back(
+        make_unique<SIBlockSchedule>(this, ID,
+                                     (Color > 0) &&
+                                     (Color <= MaxHighLatencyID)));
+      RealID[Color] = ID;
+    }
+    Blocks[RealID[Color]]->addUnit(SU);
+    Node2Block[SU->NodeNum] = RealID[Color];
+  }
----------------
Can you move this to its own function too.

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1014-1033
@@ +1013,22 @@
+  // Build dependencies between blocks
+  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
+    SUnit *SU = &SUnits[i];
+    int SUID = Node2Block[i];
+    for (SUnit::const_succ_iterator I = SU->Succs.begin(),
+         E = SU->Succs.end(); I != E; ++I) {
+      SUnit *Succ = I->getSUnit();
+      if (I->isWeak() || Succ->NodeNum >= DAGSize)
+        continue;
+      if (Node2Block[Succ->NodeNum] != SUID)
+        Blocks[SUID]->addSucc(Blocks[Node2Block[Succ->NodeNum]].get());
+    }
+    for (SUnit::const_pred_iterator I = SU->Preds.begin(),
+         E = SU->Preds.end(); I != E; ++I) {
+      SUnit *Pred = I->getSUnit();
+      if (I->isWeak() || Pred->NodeNum >= DAGSize)
+        continue;
+      if (Node2Block[Pred->NodeNum] != SUID)
+        Blocks[SUID]->addPred(Blocks[Node2Block[Pred->NodeNum]].get());
+    }
+  }
+
----------------
This one too.


http://reviews.llvm.org/D11885





More information about the llvm-commits mailing list