[PATCH] D26872: Outliner: Add MIR-level outlining pass

Matthias Braun via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 17 14:22:26 PST 2017


MatzeB added a comment.

- I hear that you are planning to get rid of the ProgramMapping construct which would be my last real stopper.
- There is still a lot of nitpicky stuff around.
- This time I checked the suffix tree construction algorithm which looks good.
- For future patches: Having a `DenseMap<unsigned, SuffixTreeNode *>` in every node is potentially wasteful. Intuitively I would expect the majority of nodes to only have a small number of entries, so it may be good to measure and explore alternative representations like a dynamically adapting datastructure (i.e. switching from linear list to map depending on number of children).



================
Comment at: include/llvm/CodeGen/Passes.h:401-402
+
+  /// This pass performs outlining on machine instructions directly before printing assembly.
+   ModulePass *createOutlinerPass();
 } // End llvm namespace
----------------
This should probably be `createMachineOutlinerPass()` to be consistent.


================
Comment at: include/llvm/Target/TargetInstrInfo.h:1518
+
+  public:
+
----------------
indentation.

You should also consider to move these functions towards the other functions so the "private:" part can stay at the end of the class definition.


================
Comment at: include/llvm/Target/TargetInstrInfo.h:1530
+  virtual void insertOutlinerEpilogue(MachineBasicBlock &MBB,
+                                    MachineFunction &MF) const {
+    llvm_unreachable(
----------------
Indentation


================
Comment at: include/llvm/Target/TargetInstrInfo.h:1559
+  /// By default, this means that the function has no red zone.
+  virtual bool functionIsSafeToOutlineFrom(Function &F) const {
+    llvm_unreachable(
----------------
As this is a codegen API I would rather pass a `MachineFunction&`


================
Comment at: lib/CodeGen/MachineOutliner.cpp:40
+
+#define DEBUG_TYPE "machine-outliner"
+
----------------
Move this below the #includes so you do not accidentally affect the headers.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:63-64
+
+STATISTIC(NumOutlinedStat, "Number of candidates outlined");
+STATISTIC(FunctionsCreatedStat, "Number of functions created");
+
----------------
Maybe drop the `Stat` suffix, esp. in a statistic dump that looks superfluous.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:172
+/// suffix in \p SuffixIdx.
+struct SuffixTreeNode {
+
----------------
Suggestion (possibly for later patches):
As far as I see it a node is either a leaf or an inner node and never changes it nature. You could make this and the constraints on the End and Children members a bit more obvious when representing this in a type hierarchy (and safe a bit of memory):

```
struct SuffixTreeNode {
  bool IsLeaf; ...
};
struct SuffixTreeLeafNode : public SuffixTreeNode {
  size_t *EndIdx;
  size_t SuffixIdx;
};
struct SuffixTreeInternalNode : SuffixTreeNode {
  Map<SuffixTreeNode*> Children;
};
```


================
Comment at: lib/CodeGen/MachineOutliner.cpp:237
+  /// The length of the substring associated with this node.
+  size_t size() {
+    size_t SubstringLen = 0;
----------------
can be const.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:238-243
+    size_t SubstringLen = 0;
+
+    if (StartIdx != EmptyIdx)
+      SubstringLen = *EndIdx - StartIdx + 1;
+
+    return SubstringLen;
----------------
- Maybe handle the special case early:
```
if (StartIdx == EmptyIdx)
  return 0;

return *EndIdx - StartIdx + 1;
```

- I assume this is not supposed to be called with *EndIdx == EmptyIdx? Add an assert()?


================
Comment at: lib/CodeGen/MachineOutliner.cpp:248-250
+/// \brief Helper struct which keeps track of the next insertion point in
+/// Ukkonen's algorithm.
+struct ActiveState {
----------------
Interesting to see this packaged up in an own struct instead of just putting the members directly into the SuffixTree class. But doesn't hurt either I guess.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:302
+  /// manually delete the end indices of the nodes in the tree.
+  BumpPtrAllocator InternalEndIdxAllocator;
+
----------------
It looks like you can use a single allocator for nodes and EndIndexes.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:345
+      E = new (InternalEndIdxAllocator) size_t(EndIdx);
+
+    N = new (NodeAllocator) SuffixTreeNode(StartIdx, E, Root);
----------------
Maybe add
```
else
  assert(EndIdx == EmptyIdx);
```
to make sure callers know what they are doing. An alternative would be to provide different functions for inserting leafs or inner nodes.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:363
+    for (auto &ChildPair : CurrentNode.Children) {
+      if (ChildPair.second != nullptr) {
+        IsLeaf = false;
----------------
You can save an indentation level here with
```
if (ChildPair.second == nullptr)
  continue;
```


================
Comment at: lib/CodeGen/MachineOutliner.cpp:387
+  /// to complete the suffix tree at the current phase.
+  void extend(size_t EndIdx, SuffixTreeNode *NeedsLink, size_t &SuffixesToAdd) {
+    while (SuffixesToAdd > 0) {
----------------
As you only have 1 "out" parameter you could simply return the new value instead. At the call side I find `SuffixesToAdd = extend(x, y, SuffixesToAdd);` easier to understand when you do not have to wonder whether a parameters is an "out" parameter.

The NeedsLink parameter is nullptr for all callers?


================
Comment at: lib/CodeGen/MachineOutliner.cpp:390-391
+
+      // The length of the current mapping is 0, so we look at the last added
+      // character to our substring.
+      if (Active.Len == 0)
----------------
General note on comments: I would expect comments at this indentation level to talk about properties/situations at that level and not just inside the if. This gets clearer if you formulate the comment as a question, or an if-then statement (or move the comment into the if block):
```
// Look at the last character if the current mapping is 0.
if (Active.Len == 0)
  Active.Idx = EndIdx;

// Current mapping is 0?
if (Active.Len == 0) {
  // Look at the last added character.
  Active.Idx = EndIdx;
}
```


================
Comment at: lib/CodeGen/MachineOutliner.cpp:397
+      unsigned FirstChar = Mapping.elementAt(Active.Idx);
+      unsigned LastChar = Mapping.elementAt(EndIdx);
+
----------------
LastChar can be moved further down as it's not used by some paths through the function.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:472
+      SuffixesToAdd--;
+      if (Active.Node->StartIdx == EmptyIdx) {
+        if (Active.Len > 0) {
----------------
Add a comment that you check whether Active.Node is the root
or alternatively add a `SuffixTreeNode::isRoot()` function.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:483-484
+      }
+  }
+}
+
----------------
Indentation


================
Comment at: lib/CodeGen/MachineOutliner.cpp:490
+  /// \param NewStr The string to append to the tree.
+  void append(std::vector<unsigned> NewStr) {
+    Mapping.MBBMappings.push_back(NewStr);
----------------
This should rather be ArrayRef<unsigned>.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:512
+    }
+
+    // Now that we're done constructing the tree, we can set the suffix indices
----------------
Add `assert(SuffixesToAdd == 0);`?


================
Comment at: lib/CodeGen/MachineOutliner.cpp:517
+    assert(Root && "Root node was null!");
+    setSuffixIndices(*Root, LabelHeight);
+  }
----------------
Maybe `setSuffixIndices(..., /*LabelHeight =*/0)` instead of the local variable so readers do not keep wondering whether it is an "out" parameter.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:687
+      // There are occurrences. Collect them and then prune them from the tree.
+      SuffixTreeNode *M;
+
----------------
move this into the loop.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:803-807
+  OutlinedFunction(MachineBasicBlock *OccBB_, size_t StartIdxInBB_,
+                   size_t EndIdxInBB_, size_t Name_, size_t Id_,
+                   size_t OccurrenceCount_)
+      : OccBB(OccBB_), StartIdxInBB(StartIdxInBB_), EndIdxInBB(EndIdxInBB_),
+        Name(Name_), Id(Id_), OccurrenceCount(OccurrenceCount_) {}
----------------
C++ does the right thing for `Name(Name)` etc. so you can drop the `_` suffixes from the parameter names. Similar with some other constructors.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:809
+};
+} // Anonymous namespace.
+
----------------
You should extend the anonymous namespace to include the MachineOutliner class. The only things that needs to be visible to the outside are `initializeMachineOutlinerPass()` (=the stuff coming out of INITIALIZE_PASS) and `createOutlinerPass()`.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:832
+  /// unsigned \p DenseMap template specialization.
+  unsigned CurrIllegalInstrMapping = -3;
+
----------------
Because this relies on implementation details of DenseMapInfo, better play it safe with static_assert so the compilation fails if someone decides to change the values:
```
static_assert(DenseMapInfo<unsigned>::getEmptyKey() == (unsigned)-1);
static_assert(DenseMapInfo<unsigned>::getTombstoneKey() == (unsigned)-2);
```
that way things also explain themselves and you can get away with a shorter comment.



The module pass instance can in theory be reused for multiple programs. So the state here needs to be initialized and cleared in `runOnModule()`.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:834
+
+  /// The last value assigned to an instruction we can outline.
+  unsigned CurrLegalInstrMapping = 0;
----------------
It's the next number to be assigned, isn't it? Same with CurrIllegalInstrMapping.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:921
+void MachineOutliner::buildInstructionMapping(std::vector<unsigned> &Container,
+                                       MachineBasicBlock &MBB,
+                                       const TargetRegisterInfo &TRI,
----------------
Indentation, MBB can be `const`


================
Comment at: lib/CodeGen/MachineOutliner.cpp:952-953
+    CurrentFunctionID++;
+    assert(CurrLegalInstrMapping < CurrIllegalInstrMapping &&
+           "Instruction mapping overflow!");
+    assert(CurrLegalInstrMapping != (unsigned)-1 &&
----------------
The overflow could (in theory) be triggered by a user and not just by compiler bugs. So could use report_fatal_error() so it stays around in release builds:
```
if (CurrLegalInstrMapping < CurIllegalInstrMapping)
  report_fatal_error("Instruction mapping overflow!");
```


================
Comment at: lib/CodeGen/MachineOutliner.cpp:954
+           "Instruction mapping overflow!");
+    assert(CurrLegalInstrMapping != (unsigned)-1 &&
+           CurrLegalInstrMapping != (unsigned)-2 &&
----------------
Use `DenseMapInfo<unsigned>::get{Empty|Tombstone}Key()` instead of hardcoding the values.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:1008
+        MachineBasicBlock *OccBB = Worklist[FirstIdxAndOffset.first];
+        FunctionList.push_back(OutlinedFunction(
+            OccBB, StartIdxInBB, EndIdxInBB, FunctionList.size(),
----------------
Could use `emplace_back(OccBB, StartIdxInBB, ...)`


================
Comment at: lib/CodeGen/MachineOutliner.cpp:1017
+
+          CandidateList.push_back(Candidate(
+              IdxAndOffset.first,      // Idx of MBB containing candidate.
----------------
Could use `emplace_back()`.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:1050
+  NameStream << "OUTLINED_FUNCTION" << OF.Name;
+  std::string *Name = new std::string(NameStream.str());
+
----------------
I think getOrInsertFunction() copies the name and does not take ownership of the passed string so this is an unnecessary copy and a memory leak.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:1066-1068
+  MachineBasicBlock *MBB = MF.CreateMachineBasicBlock();
+  const TargetSubtargetInfo *STI = &(MF.getSubtarget());
+  const TargetInstrInfo *TII = STI->getInstrInfo();
----------------
Use references for variables that cannot be `nullptr`.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:1103-1104
+                              std::vector<MachineBasicBlock *> &Worklist,
+                              std::vector<Candidate> &CandidateList,
+                              std::vector<OutlinedFunction> &FunctionList,
+                              ProgramMapping &Mapping) {
----------------
I think CandidateList and FunctionList can be `const` or better `ArrayRef`.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:1213
+  // Find all of the candidates for outlining and then outline them.
+  bool OutlinedSomething = false;
+  std::vector<Candidate> CandidateList;
----------------
Move to assignment.


================
Comment at: lib/CodeGen/MachineOutliner.cpp:1214-1215
+  bool OutlinedSomething = false;
+  std::vector<Candidate> CandidateList;
+  std::vector<OutlinedFunction> FunctionList;
+
----------------
As you have some state such as CurrentFunctionID, CurrLegalMapping in the class anyway, maybe the two vectors and the Worklist can move there as well so you do not need to pass them around? Just need to `clear()` them at the end of the function then.


Repository:
  rL LLVM

https://reviews.llvm.org/D26872





More information about the llvm-commits mailing list