[PATCH] D11885: AMDGPU/SI: Add SI Machine Scheduler

Mon Sep 14 14:37:07 PDT 2015

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2732-2734
@@ +2731,5 @@
+
+  if (isMUBUF(Opc) || isMTBUF(Opc) || isMIMG(Opc))
+    return true;
+  return false;
+}
----------------
This can return the condition without the if.

This also isn't 100% accurate. This will miss barriers and a few other cases. Better would be to check the instruction latency maybe with a threshold value of some sort.

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:128
@@ +127,3 @@
+  switch (Reason) {
+  case NoCand:         return "NOCAND    ";
+  case RegUsage:       return "REGUSAGE  ";
----------------
Why are these padded with spaces at the end?

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:134
@@ +133,3 @@
+  case NodeOrder:      return "ORDER     ";
+  };
+  llvm_unreachable("Unknown reason!");
----------------
You don't need a ; here

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:174
@@ +173,3 @@
+
+// SIScheduleBlock //
+
----------------
Looks like a stray comment

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:200-201
@@ +199,4 @@
+  // Order of priority is:
+  // . Low latency instructions which do not depend on low latencies we
+  //   haven't waited
+  // . Other instructions which do not depend on low latencies we haven't
----------------
This sentence reads weirdly / seems to be missing some words

How about:

"Low latency instructions which do not depend on other low latency instructions we haven't waited for"

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:202-203
@@ +201,4 @@
+  //   haven't waited
+  // . Other instructions which do not depend on low latencies we haven't
+  //   waited
+  // . Low latencies
----------------
Ditto

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:209
@@ +208,3 @@
+  //     - instructions that depend on the first low latency instructions.
+  // If never in the block there is a lot of constant loads, the SGPR usage
+  // could go quite high, thus above the arbitrary limit of 60 will encourage
----------------
"If never in the block there is a lot of constant loads" -> "If there are not many constant loads in the block"

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:217
@@ +216,3 @@
+
+  if (Cand.SGPRUsage > 60 &&
+      tryLess(TryCand.SGPRUsage, Cand.SGPRUsage, TryCand, Cand, RegUsage))
----------------
This shouldn't be hardcoded. A pass parameter or command line option read during pass initialization

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:245-246
@@ +244,4 @@
+    SISchedCandidate TryCand;
+    std::vector<unsigned> pressure = TopRPTracker.getRegSetPressureAtPos();
+    std::vector<unsigned> MaxPressure = TopRPTracker.getRegSetPressureAtPos();
+    // predict register usage after this instruction
----------------
Did you mean for these to be copies?

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:247
@@ +246,3 @@
+    std::vector<unsigned> MaxPressure = TopRPTracker.getRegSetPressureAtPos();
+    // predict register usage after this instruction
+    TopRPTracker.getDownwardPressure((*I)->getInstr(), pressure, MaxPressure);
----------------
Capitalize and period

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:249
@@ +248,3 @@
+    TopRPTracker.getDownwardPressure((*I)->getInstr(), pressure, MaxPressure);
+    TryCand.SU = *I;
+    TryCand.SGPRUsage = pressure[DAG->SGPRSetID];
----------------
Can you use this instead of repeating (*I)-> multiple times?

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:259
@@ +258,3 @@
+      TopCand.setBest(TryCand);
+      //DEBUG(traceCandidate(Cand));
+    }
----------------
Dead code. Uncomment and maybe add more of a description if this is useful debug printing

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:273-274
@@ +272,4 @@
+
+  for (std::vector<SUnit*>::iterator I = SUnits.begin(),
+       E = SUnits.end(); I != E; ++I) {
+    SUnit *SU = *I;
----------------
This can be a range for

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:291
@@ +290,3 @@
+// Returns if the register was set between first and last
+static bool findDefBetween(unsigned Reg,
+                           SlotIndex First, SlotIndex Last,
----------------
Function would be better named isDefBetween since it doesn't return the def.

Something about this function doesn't seem right to me. I don't think you should be looking at the def_instructions, and instead checking the Segments of the LiveRange.

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:295-297
@@ +294,5 @@
+                           const LiveIntervals *LIS) {
+  for (MachineRegisterInfo::def_instr_iterator
+       UI = MRI->def_instr_begin(Reg),
+       UE = MRI->def_instr_end(); UI != UE; ++UI) {
+    const MachineInstr* MI = &*UI;
----------------
This can be a range loop over MRI->def_instructions()

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:320-321
@@ +319,4 @@
+  // to execute, and what is still alive at the end
+  for (std::vector<SUnit*>::iterator I = ScheduledSUnits.begin(),
+       E = ScheduledSUnits.end(); I != E; ++I) {
+    RPTracker.setPos((*I)->getInstr());
----------------
This can be a range for

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:371
@@ +370,3 @@
+                       MRI, LIS) &&
+        TargetRegisterInfo::isVirtualRegister(Reg)) {
+      LiveOutRegs.insert(Reg);
----------------
isVirtualRegister should be checked first

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:400-401
@@ +399,4 @@
+
+  for (std::vector<SUnit*>::iterator I = SUnits.begin(),
+       E = SUnits.end(); I != E; ++I) {
+    SUnit *SU = *I;
----------------
Range for

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:419-420
@@ +418,4 @@
+
+  // Check everything is right
+#ifndef NDEBUG
+  assert(SUnits.size() == ScheduledSUnits.size());
----------------
Can you factor this into a separate check predicate function to assert on

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:423-424
@@ +422,4 @@
+  assert(TopReadySUs.empty());
+  for (std::vector<SUnit*>::iterator I = SUnits.begin(),
+       E = SUnits.end(); I != E; ++I) {
+    SUnit *SU = *I;
----------------
Range for

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:426-427
@@ +425,4 @@
+    SUnit *SU = *I;
+    assert (SU->isScheduled);
+    assert (SU->NumPredsLeft == 0);
+  }
----------------
No space between assert and (

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:435-436
@@ +434,4 @@
+void SIScheduleBlock::undoSchedule() {
+  for (std::vector<SUnit*>::iterator I = SUnits.begin(),
+       E = SUnits.end(); I != E; ++I) {
+    SUnit *SU = *I;
----------------
Range for

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:507-508
@@ +506,4 @@
+  releaseSuccessors(SU, true);
+  // scheduling this node will trigger a wait,
+  // thus propagate to other instructions that they do not need to wait either
+  if (HasLowLatencyNonWaitedParent[NodeNum2Index[SU->NodeNum]]) {
----------------
Capitalize / period

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:526-527
@@ +525,4 @@
+  // We remove links from outside blocks to enable scheduling inside the block
+  for (std::vector<SUnit*>::iterator I = SUnits.begin(),
+       E = SUnits.end(); I != E; ++I) {
+    releaseSuccessors(*I, false);
----------------
Range for

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:539-544
@@ +538,8 @@
+
+  // check if not already predecessor
+  for (std::vector<SIScheduleBlock*>::iterator I = Preds.begin(),
+       E = Preds.end(); I != E; ++I) {
+    if (PredID == (*I)->ID)
+      return;
+  }
+  Preds.push_back(Pred);
----------------
This could be an std::find or factored into own function

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:546-552
@@ +545,9 @@
+  Preds.push_back(Pred);
+  DEBUG(
+    for (std::vector<SIScheduleBlock*>::iterator I = Succs.begin(),
+         E = Succs.end(); I != E; ++I) {
+      if (PredID == (*I)->ID)
+        assert(!"Loop in the Block Graph!\n");
+    }
+  );
+}
----------------
Asserts should not be placed under DEBUG(). It would be better to put this into a separate function and assert on it

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:558-563
@@ +557,8 @@
+
+  // check if not already predecessor
+  for (std::vector<SIScheduleBlock*>::iterator I = Succs.begin(),
+       E = Succs.end(); I != E; ++I) {
+    if (SuccID == (*I)->ID)
+      return;
+  }
+  if (Succ->isHighLatencyBlock())
----------------
Similar std::find or separate function again

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:591-592
@@ +590,4 @@
+  dbgs() << "\nSuccessors:\n";
+  for (std::vector<SIScheduleBlock*>::iterator I = Succs.begin(),
+       E = Succs.end(); I != E; ++I) {
+    (*I)->printDebug(false);
----------------
Range for

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:597-598
@@ +596,4 @@
+  if (Scheduled) {
+    dbgs() << "LiveInPressure " << LiveInPressure[DAG->SGPRSetID] << " "
+           << LiveInPressure[DAG->VGPRSetID] << "\n";
+    dbgs() << "LiveOutPressure " << LiveOutPressure[DAG->SGPRSetID] << " "
----------------
Single quotes around the single characters

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:650-652
@@ +649,5 @@
+    Blocks[BlockVariant] = CurrentBlocks;
+    return CurrentBlocks;
+  } else {
+    return B->second;
+  }
----------------
No return after else

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:710
@@ +709,3 @@
+    WorkList.push_back(DAG->getEntrySU());
+  for (int i = DAGSize-1; i >= 0; --i) {
+    SUnit *SU = &DAG->SUnits[i];
----------------
Space around -

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:850
@@ +849,3 @@
+    // keep color 0
+    if (SUColors.size() == 0)
+      continue;
----------------
.empty()

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:870
@@ +869,3 @@
+  unsigned DAGSize = DAG->SUnits.size();
+  std::map<std::set<unsigned>, unsigned> ColorCombinations;
+
----------------
Having a set as the key to a map looks very strange. Is this really what you want?

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:911-912
@@ +910,4 @@
+
+    /* No predecessor: vgpr constant loading */
+    /* low latency instruction usually have a predecessor (the address) */
+    if (SU->Preds.size() > 0 && !DAG->IsLowLatencySU[SU->NodeNum])
----------------
Should use C++ style comments

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1040-1041
@@ +1039,4 @@
+  }
+  DEBUG(dbgs() << "Blocks created:\n\n");
+  DEBUG(
+    for (unsigned i = 0, e = CurrentBlocks.size(); i != e; ++i) {
----------------
This can be in a single DEBUG() statement

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1081
@@ +1080,3 @@
+
+  DEBUG(dbgs() << "\nScheduling Blocks\n\n"); //TODO mettre nom variance
+
----------------
Non English comment

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1123-1134
@@ +1122,14 @@
+
+#ifndef NDEBUG
+  // Check correctness of the ordering
+  for (unsigned i = 0, e = DAGSize; i != e; ++i) {
+    SIScheduleBlock *Block = CurrentBlocks[i];
+    for (std::vector<SIScheduleBlock*>::iterator I = Block->Preds.begin(),
+         E = Block->Preds.end(); I != E; ++I) {
+      SIScheduleBlock *Pred = *I;
+      assert(TopDownIndice2Index[i] > TopDownIndice2Index[Pred->ID] &&
+      "Wrong Top Down topological sorting");
+    }
+  }
+#endif
+
----------------
This should be another separate function

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1151-1152
@@ +1150,4 @@
+
+    for (std::vector<SUnit*>::iterator I = SUs.begin(),
+         E = SUs.end(); I != E; ++I) {
+      MachineInstr *MI = (*I)->getInstr();
----------------
Range for

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1242-1247
@@ +1241,8 @@
+
+  DEBUG(
+    for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
+      SIScheduleBlock *Block = Blocks[i];
+      assert(Block->ID == i);
+    }
+  );
+
----------------
No asserts under DEBUG()

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1355-1358
@@ +1354,6 @@
+         E = LiveRegs.end(); I != E; ++I)
+      dbgs() << PrintVRegOrUnit(*I, DAG->getTRI()) << " ";
+    dbgs() << "\n";
+    dbgs() << "Current VGPRs: " << VregCurrentUsage << "\n";
+    dbgs() << "Current SGPRs: " << SregCurrentUsage << "\n";
+  );
----------------
Single quote ' ' and '\n'

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1382-1383
@@ +1381,4 @@
+    dbgs() << "Picking: " << Cand.Block->ID << "\n";
+    if (Cand.IsHighLatency)
+      dbgs() << "Is a block with high latency instruction\n";
+    dbgs() << "Position of last high latency dependency: "
----------------
I would prefer if debug printing is going to print something when it always prints yes/no rather than just if it's happening

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1416-1418
@@ +1415,5 @@
+    std::set<unsigned>::iterator Pos = LiveRegs.find(Reg);
+    assert (Pos != LiveRegs.end()); // Reg must be live
+    assert (LiveRegsConsumers.find(Reg) != LiveRegsConsumers.end());
+    assert (LiveRegsConsumers[Reg] >= 1);
+    --LiveRegsConsumers[Reg];
----------------
These can all be in one &&ed assert

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1556
@@ +1555,3 @@
+
+    if (SITII->isLowLatencyInstruction(SU->getInstr())) {
+      unsigned BestPos = LastLowLatencyUser + 1;
----------------
The normal naming convention is for this to just be TTI rather than SITTI

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1636-1637
@@ +1635,4 @@
+  buildDAGWithRegPressure();
+  DEBUG(for (std::vector<SUnit>::iterator I = SUnits.begin(),
+             E = SUnits.end(); I != E; ++I)
+          (*I).dumpAll(this));
----------------
range for and put on a separate line from the DEBUG()

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1682
@@ +1681,3 @@
+
+  // tell the outside world about the result of the scheduling
+
----------------
Capitalize / period

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.h:26-27
@@ +25,4 @@
+
+enum SIScheduleCandReason {
+    NoCand, RegUsage, Latency, Successor, Depth, NodeOrder};
+
----------------
This is wrapped weirdly. I would prefer one item per line

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.h:34
@@ +33,3 @@
+  // Set of reasons that apply to multiple candidates.
+  uint32_t RepeatReasonSet;
+
----------------
This looks like it should be an llvm::BitVector

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.h:213-214
@@ +212,4 @@
+  void topologicalSort();
+  // Give a Reserved color to every high latency
+  void colorHighLatenciesAlone();
+  // Creates groups of high latencies with a Reserved color
----------------
Can you put a line between each function and the comment on the following declaration

================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.h:365-366
@@ +364,4 @@
+  const TargetRegisterInfo *getTRI() {return TRI;}
+  SUnit* getEntrySU() {return &EntrySU;};
+  SUnit* getExitSU() {return &ExitSU;};
+
----------------
These should return references

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:33-34
@@ +32,4 @@
+  }
+  for (unsigned i = 0; i < getNumRegPressureSets(); ++i) {
+    if (strncmp("SGPR_32", getRegPressureSetName(i), 7) == 0)
+      SGPR32SetID = i;
----------------
Can you do something besides compare the name? If not, this should compare StringRefs or something rather than using the libc functions

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:557
@@ +556,3 @@
+  unsigned i;
+  const unsigned int *SGPRsForWaveFrontsForGen;
+
----------------
llvm style omits the int here

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:559-562
@@ +558,6 @@
+
+  if (gen >= AMDGPUSubtarget::VOLCANIC_ISLANDS)
+    SGPRsForWaveFrontsForGen = &SGPRsForWaveFrontsVI[0];
+  else
+    SGPRsForWaveFrontsForGen = &SGPRsForWaveFronts[0];
+
----------------
This should be a ternary operator initialization

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:570
@@ +569,3 @@
+
+  return i+1;
+}
----------------
space around +

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:577-582
@@ +576,8 @@
+  unsigned cost = 0;
+  const unsigned int *SGPRsForWaveFrontsForGen;
+
+  if (gen >= AMDGPUSubtarget::VOLCANIC_ISLANDS)
+    SGPRsForWaveFrontsForGen = &SGPRsForWaveFrontsVI[0];
+  else
+    SGPRsForWaveFrontsForGen = &SGPRsForWaveFronts[0];
+
----------------
Ditto

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:586
@@ +585,3 @@
+    cost += SGPRsUsed - SGPRsForWaveFrontsForGen[0];
+  // Spilling VGPR hurts more than SGPR
+  if (VGPRsForWaveFronts[0] < VGPRsUsed)
----------------
Grammar: VGPR->VGPRs, SGPR->SGPRs

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:588
@@ +587,3 @@
+  if (VGPRsForWaveFronts[0] < VGPRsUsed)
+    cost += 4*(VGPRsUsed - VGPRsForWaveFronts[0]);
+
----------------
Space around *

http://reviews.llvm.org/D11885