[PATCH] D11885: AMDGPU/SI: Add SI Machine Scheduler
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Mon Sep 14 14:37:07 PDT 2015
arsenm added inline comments.
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2732-2734
@@ +2731,5 @@
+
+ if (isMUBUF(Opc) || isMTBUF(Opc) || isMIMG(Opc))
+ return true;
+ return false;
+}
----------------
This can return the condition without the if.
This also isn't 100% accurate. This will miss barriers and a few other cases. Better would be to check the instruction latency maybe with a threshold value of some sort.
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:128
@@ +127,3 @@
+ switch (Reason) {
+ case NoCand: return "NOCAND ";
+ case RegUsage: return "REGUSAGE ";
----------------
Why are these padded with spaces at the end?
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:134
@@ +133,3 @@
+ case NodeOrder: return "ORDER ";
+ };
+ llvm_unreachable("Unknown reason!");
----------------
You don't need a ; here
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:174
@@ +173,3 @@
+
+// SIScheduleBlock //
+
----------------
Looks like a stray comment
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:200-201
@@ +199,4 @@
+ // Order of priority is:
+ // . Low latency instructions which do not depend on low latencies we
+ // haven't waited
+ // . Other instructions which do not depend on low latencies we haven't
----------------
This sentence reads weirdly / seems to be missing some words
How about:
"Low latency instructions which do not depend on other low latency instructions we haven't waited for"
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:202-203
@@ +201,4 @@
+ // haven't waited
+ // . Other instructions which do not depend on low latencies we haven't
+ // waited
+ // . Low latencies
----------------
Ditto
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:209
@@ +208,3 @@
+ // - instructions that depend on the first low latency instructions.
+ // If never in the block there is a lot of constant loads, the SGPR usage
+ // could go quite high, thus above the arbitrary limit of 60 will encourage
----------------
"If never in the block there is a lot of constant loads" -> "If there are not many constant loads in the block"
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:217
@@ +216,3 @@
+
+ if (Cand.SGPRUsage > 60 &&
+ tryLess(TryCand.SGPRUsage, Cand.SGPRUsage, TryCand, Cand, RegUsage))
----------------
This shouldn't be hardcoded. A pass parameter or command line option read during pass initialization
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:245-246
@@ +244,4 @@
+ SISchedCandidate TryCand;
+ std::vector<unsigned> pressure = TopRPTracker.getRegSetPressureAtPos();
+ std::vector<unsigned> MaxPressure = TopRPTracker.getRegSetPressureAtPos();
+ // predict register usage after this instruction
----------------
Did you mean for these to be copies?
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:247
@@ +246,3 @@
+ std::vector<unsigned> MaxPressure = TopRPTracker.getRegSetPressureAtPos();
+ // predict register usage after this instruction
+ TopRPTracker.getDownwardPressure((*I)->getInstr(), pressure, MaxPressure);
----------------
Capitalize and period
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:249
@@ +248,3 @@
+ TopRPTracker.getDownwardPressure((*I)->getInstr(), pressure, MaxPressure);
+ TryCand.SU = *I;
+ TryCand.SGPRUsage = pressure[DAG->SGPRSetID];
----------------
Can you use this instead of repeating (*I)-> multiple times?
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:259
@@ +258,3 @@
+ TopCand.setBest(TryCand);
+ //DEBUG(traceCandidate(Cand));
+ }
----------------
Dead code. Uncomment and maybe add more of a description if this is useful debug printing
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:273-274
@@ +272,4 @@
+
+ for (std::vector<SUnit*>::iterator I = SUnits.begin(),
+ E = SUnits.end(); I != E; ++I) {
+ SUnit *SU = *I;
----------------
This can be a range for
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:291
@@ +290,3 @@
+// Returns if the register was set between first and last
+static bool findDefBetween(unsigned Reg,
+ SlotIndex First, SlotIndex Last,
----------------
Function would be better named isDefBetween since it doesn't return the def.
Something about this function doesn't seem right to me. I don't think you should be looking at the def_instructions, and instead checking the Segments of the LiveRange.
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:295-297
@@ +294,5 @@
+ const LiveIntervals *LIS) {
+ for (MachineRegisterInfo::def_instr_iterator
+ UI = MRI->def_instr_begin(Reg),
+ UE = MRI->def_instr_end(); UI != UE; ++UI) {
+ const MachineInstr* MI = &*UI;
----------------
This can be a range loop over MRI->def_instructions()
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:320-321
@@ +319,4 @@
+ // to execute, and what is still alive at the end
+ for (std::vector<SUnit*>::iterator I = ScheduledSUnits.begin(),
+ E = ScheduledSUnits.end(); I != E; ++I) {
+ RPTracker.setPos((*I)->getInstr());
----------------
This can be a range for
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:371
@@ +370,3 @@
+ MRI, LIS) &&
+ TargetRegisterInfo::isVirtualRegister(Reg)) {
+ LiveOutRegs.insert(Reg);
----------------
isVirtualRegister should be checked first
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:400-401
@@ +399,4 @@
+
+ for (std::vector<SUnit*>::iterator I = SUnits.begin(),
+ E = SUnits.end(); I != E; ++I) {
+ SUnit *SU = *I;
----------------
Range for
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:419-420
@@ +418,4 @@
+
+ // Check everything is right
+#ifndef NDEBUG
+ assert(SUnits.size() == ScheduledSUnits.size());
----------------
Can you factor this into a separate check predicate function to assert on
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:423-424
@@ +422,4 @@
+ assert(TopReadySUs.empty());
+ for (std::vector<SUnit*>::iterator I = SUnits.begin(),
+ E = SUnits.end(); I != E; ++I) {
+ SUnit *SU = *I;
----------------
Range for
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:426-427
@@ +425,4 @@
+ SUnit *SU = *I;
+ assert (SU->isScheduled);
+ assert (SU->NumPredsLeft == 0);
+ }
----------------
No space between assert and (
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:435-436
@@ +434,4 @@
+void SIScheduleBlock::undoSchedule() {
+ for (std::vector<SUnit*>::iterator I = SUnits.begin(),
+ E = SUnits.end(); I != E; ++I) {
+ SUnit *SU = *I;
----------------
Range for
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:507-508
@@ +506,4 @@
+ releaseSuccessors(SU, true);
+ // scheduling this node will trigger a wait,
+ // thus propagate to other instructions that they do not need to wait either
+ if (HasLowLatencyNonWaitedParent[NodeNum2Index[SU->NodeNum]]) {
----------------
Capitalize / period
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:526-527
@@ +525,4 @@
+ // We remove links from outside blocks to enable scheduling inside the block
+ for (std::vector<SUnit*>::iterator I = SUnits.begin(),
+ E = SUnits.end(); I != E; ++I) {
+ releaseSuccessors(*I, false);
----------------
Range for
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:539-544
@@ +538,8 @@
+
+ // check if not already predecessor
+ for (std::vector<SIScheduleBlock*>::iterator I = Preds.begin(),
+ E = Preds.end(); I != E; ++I) {
+ if (PredID == (*I)->ID)
+ return;
+ }
+ Preds.push_back(Pred);
----------------
This could be an std::find or factored into own function
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:546-552
@@ +545,9 @@
+ Preds.push_back(Pred);
+ DEBUG(
+ for (std::vector<SIScheduleBlock*>::iterator I = Succs.begin(),
+ E = Succs.end(); I != E; ++I) {
+ if (PredID == (*I)->ID)
+ assert(!"Loop in the Block Graph!\n");
+ }
+ );
+}
----------------
Asserts should not be placed under DEBUG(). It would be better to put this into a separate function and assert on it
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:558-563
@@ +557,8 @@
+
+ // check if not already predecessor
+ for (std::vector<SIScheduleBlock*>::iterator I = Succs.begin(),
+ E = Succs.end(); I != E; ++I) {
+ if (SuccID == (*I)->ID)
+ return;
+ }
+ if (Succ->isHighLatencyBlock())
----------------
Similar std::find or separate function again
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:591-592
@@ +590,4 @@
+ dbgs() << "\nSuccessors:\n";
+ for (std::vector<SIScheduleBlock*>::iterator I = Succs.begin(),
+ E = Succs.end(); I != E; ++I) {
+ (*I)->printDebug(false);
----------------
Range for
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:597-598
@@ +596,4 @@
+ if (Scheduled) {
+ dbgs() << "LiveInPressure " << LiveInPressure[DAG->SGPRSetID] << " "
+ << LiveInPressure[DAG->VGPRSetID] << "\n";
+ dbgs() << "LiveOutPressure " << LiveOutPressure[DAG->SGPRSetID] << " "
----------------
Single quotes around the single characters
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:650-652
@@ +649,5 @@
+ Blocks[BlockVariant] = CurrentBlocks;
+ return CurrentBlocks;
+ } else {
+ return B->second;
+ }
----------------
No return after else
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:710
@@ +709,3 @@
+ WorkList.push_back(DAG->getEntrySU());
+ for (int i = DAGSize-1; i >= 0; --i) {
+ SUnit *SU = &DAG->SUnits[i];
----------------
Space around -
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:850
@@ +849,3 @@
+ // keep color 0
+ if (SUColors.size() == 0)
+ continue;
----------------
.empty()
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:870
@@ +869,3 @@
+ unsigned DAGSize = DAG->SUnits.size();
+ std::map<std::set<unsigned>, unsigned> ColorCombinations;
+
----------------
Having a set as the key to a map looks very strange. Is this really what you want?
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:911-912
@@ +910,4 @@
+
+ /* No predecessor: vgpr constant loading */
+ /* low latency instruction usually have a predecessor (the address) */
+ if (SU->Preds.size() > 0 && !DAG->IsLowLatencySU[SU->NodeNum])
----------------
Should use C++ style comments
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1040-1041
@@ +1039,4 @@
+ }
+ DEBUG(dbgs() << "Blocks created:\n\n");
+ DEBUG(
+ for (unsigned i = 0, e = CurrentBlocks.size(); i != e; ++i) {
----------------
This can be in a single DEBUG() statement
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1081
@@ +1080,3 @@
+
+ DEBUG(dbgs() << "\nScheduling Blocks\n\n"); //TODO mettre nom variance
+
----------------
Non English comment
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1123-1134
@@ +1122,14 @@
+
+#ifndef NDEBUG
+ // Check correctness of the ordering
+ for (unsigned i = 0, e = DAGSize; i != e; ++i) {
+ SIScheduleBlock *Block = CurrentBlocks[i];
+ for (std::vector<SIScheduleBlock*>::iterator I = Block->Preds.begin(),
+ E = Block->Preds.end(); I != E; ++I) {
+ SIScheduleBlock *Pred = *I;
+ assert(TopDownIndice2Index[i] > TopDownIndice2Index[Pred->ID] &&
+ "Wrong Top Down topological sorting");
+ }
+ }
+#endif
+
----------------
This should be another separate function
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1151-1152
@@ +1150,4 @@
+
+ for (std::vector<SUnit*>::iterator I = SUs.begin(),
+ E = SUs.end(); I != E; ++I) {
+ MachineInstr *MI = (*I)->getInstr();
----------------
Range for
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1242-1247
@@ +1241,8 @@
+
+ DEBUG(
+ for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
+ SIScheduleBlock *Block = Blocks[i];
+ assert(Block->ID == i);
+ }
+ );
+
----------------
No asserts under DEBUG()
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1355-1358
@@ +1354,6 @@
+ E = LiveRegs.end(); I != E; ++I)
+ dbgs() << PrintVRegOrUnit(*I, DAG->getTRI()) << " ";
+ dbgs() << "\n";
+ dbgs() << "Current VGPRs: " << VregCurrentUsage << "\n";
+ dbgs() << "Current SGPRs: " << SregCurrentUsage << "\n";
+ );
----------------
Single quote ' ' and '\n'
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1382-1383
@@ +1381,4 @@
+ dbgs() << "Picking: " << Cand.Block->ID << "\n";
+ if (Cand.IsHighLatency)
+ dbgs() << "Is a block with high latency instruction\n";
+ dbgs() << "Position of last high latency dependency: "
----------------
I would prefer if debug printing is going to print something when it always prints yes/no rather than just if it's happening
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1416-1418
@@ +1415,5 @@
+ std::set<unsigned>::iterator Pos = LiveRegs.find(Reg);
+ assert (Pos != LiveRegs.end()); // Reg must be live
+ assert (LiveRegsConsumers.find(Reg) != LiveRegsConsumers.end());
+ assert (LiveRegsConsumers[Reg] >= 1);
+ --LiveRegsConsumers[Reg];
----------------
These can all be in one &&ed assert
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1556
@@ +1555,3 @@
+
+ if (SITII->isLowLatencyInstruction(SU->getInstr())) {
+ unsigned BestPos = LastLowLatencyUser + 1;
----------------
The normal naming convention is for this to just be TTI rather than SITTI
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1636-1637
@@ +1635,4 @@
+ buildDAGWithRegPressure();
+ DEBUG(for (std::vector<SUnit>::iterator I = SUnits.begin(),
+ E = SUnits.end(); I != E; ++I)
+ (*I).dumpAll(this));
----------------
range for and put on a separate line from the DEBUG()
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.cpp:1682
@@ +1681,3 @@
+
+ // tell the outside world about the result of the scheduling
+
----------------
Capitalize / period
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.h:26-27
@@ +25,4 @@
+
+enum SIScheduleCandReason {
+ NoCand, RegUsage, Latency, Successor, Depth, NodeOrder};
+
----------------
This is wrapped weirdly. I would prefer one item per line
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.h:34
@@ +33,3 @@
+ // Set of reasons that apply to multiple candidates.
+ uint32_t RepeatReasonSet;
+
----------------
This looks like it should be an llvm::BitVector
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.h:213-214
@@ +212,4 @@
+ void topologicalSort();
+ // Give a Reserved color to every high latency
+ void colorHighLatenciesAlone();
+ // Creates groups of high latencies with a Reserved color
----------------
Can you put a line between each function and the comment on the following declaration
================
Comment at: lib/Target/AMDGPU/SIMachineScheduler.h:365-366
@@ +364,4 @@
+ const TargetRegisterInfo *getTRI() {return TRI;}
+ SUnit* getEntrySU() {return &EntrySU;};
+ SUnit* getExitSU() {return &ExitSU;};
+
----------------
These should return references
================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:33-34
@@ +32,4 @@
+ }
+ for (unsigned i = 0; i < getNumRegPressureSets(); ++i) {
+ if (strncmp("SGPR_32", getRegPressureSetName(i), 7) == 0)
+ SGPR32SetID = i;
----------------
Can you do something besides compare the name? If not, this should compare StringRefs or something rather than using the libc functions
================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:557
@@ +556,3 @@
+ unsigned i;
+ const unsigned int *SGPRsForWaveFrontsForGen;
+
----------------
llvm style omits the int here
================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:559-562
@@ +558,6 @@
+
+ if (gen >= AMDGPUSubtarget::VOLCANIC_ISLANDS)
+ SGPRsForWaveFrontsForGen = &SGPRsForWaveFrontsVI[0];
+ else
+ SGPRsForWaveFrontsForGen = &SGPRsForWaveFronts[0];
+
----------------
This should be a ternary operator initialization
================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:570
@@ +569,3 @@
+
+ return i+1;
+}
----------------
space around +
================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:577-582
@@ +576,8 @@
+ unsigned cost = 0;
+ const unsigned int *SGPRsForWaveFrontsForGen;
+
+ if (gen >= AMDGPUSubtarget::VOLCANIC_ISLANDS)
+ SGPRsForWaveFrontsForGen = &SGPRsForWaveFrontsVI[0];
+ else
+ SGPRsForWaveFrontsForGen = &SGPRsForWaveFronts[0];
+
----------------
Ditto
================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:586
@@ +585,3 @@
+ cost += SGPRsUsed - SGPRsForWaveFrontsForGen[0];
+ // Spilling VGPR hurts more than SGPR
+ if (VGPRsForWaveFronts[0] < VGPRsUsed)
----------------
Grammar: VGPR->VGPRs, SGPR->SGPRs
================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:588
@@ +587,3 @@
+ if (VGPRsForWaveFronts[0] < VGPRsUsed)
+ cost += 4*(VGPRsUsed - VGPRsForWaveFronts[0]);
+
----------------
Space around *
http://reviews.llvm.org/D11885
More information about the llvm-commits
mailing list