[llvm] [MachineOutliner] Leaf Descendants (PR #90275)
Kyungwoo Lee via llvm-commits
llvm-commits at lists.llvm.org
Fri Jun 7 16:24:34 PDT 2024
================
@@ -105,6 +114,68 @@ void SuffixTree::setSuffixIndices() {
}
}
+void SuffixTree::setLeafNodes() {
+ // A stack that keeps track of nodes to visit for post-order DFS traversal.
+ std::stack<SuffixTreeNode *> ToVisit;
+ ToVisit.push(Root);
+
+ // This keeps track of the index of the next leaf node to be added to
+ // the LeafNodes vector of the suffix tree.
+ unsigned LeafCounter = 0;
+
+ // This keeps track of nodes whose children have been added to the stack
+ // during the post-order depth-first traversal of the tree.
+ llvm::SmallPtrSet<SuffixTreeInternalNode *, 32> ChildrenAddedToStack;
+
+ // Traverse the tree in post-order.
+ while (!ToVisit.empty()) {
+ SuffixTreeNode *CurrNode = ToVisit.top();
+ ToVisit.pop();
+ if (auto *CurrInternalNode = dyn_cast<SuffixTreeInternalNode>(CurrNode)) {
+ // The current node is an internal node.
+ if (ChildrenAddedToStack.find(CurrInternalNode) !=
+ ChildrenAddedToStack.end()) {
+ // If the children of the current node has been added to the stack,
+ // then this is the second time we visit this node and at this point,
+ // all of its children have already been processed. Now, we can
+ // set its LeftLeafIdx and RightLeafIdx;
+ auto it = CurrInternalNode->Children.begin();
+ if (it != CurrInternalNode->Children.end()) {
+ // Get the first child to use its RightLeafIdx. The RightLeafIdx is
+ // used as the first child is the initial one added to the stack, so
+ // it's the last one to be processed. This implies that the leaf
+ // descendants of the first child are assigned the largest index
+ // numbers.
----------------
kyulee-com wrote:
Yeah. Using `MapVector` is an option, but it will be expensive for a large suffix tree. Given `setLeafNodes()` is the last step after the suffix tree is frozen while this step is read-only, I think it's okay. However, it's still clunky to iterate `Children` once for the first visit (below the `else` part) and again here in the second visit.
So, instead of `ChildrenAddedToStack`, if you define `DenseMap<SuffixTreeInternalNode *, std::pair<SuffixTreeNode *, SuffixTreeNode *>> ChildrenMap;` to track on the first/last child in the first visit, and use it in the second visit here. Something like:
```
auto I = ChildrenMap.find(CurrInternalNode);
if (I != ChildrenMap.end()) {
// Node's children have already been processed
auto [FirstChild, LastChild] = I->second;
CurrNode->setRightLeafIdx(FirstChild->getRightLeafIdx());
CurrNode->setLeftLeafIdx(LastChild->getLeftLeafIdx());
assert(CurrNode->getLeftLeafIdx() <= CurrNode->getRightLeafIdx() &&
"LeftLeafIdx should not be larger than RightLeafIdx");
} else {
// This is the first time we visit this node. This means that its
// children have not been added to the stack yet. Hence, we will add
// the current node back to the stack and add its children to the
// stack for processing.
auto I = CurrInternalNode->Children.begin();
if (I != CurrInternalNode->Children.end()) {
ToVisit.push_back(CurrNode);
SuffixTreeNode *FirstChild = I->second;
SuffixTreeNode *LastChild = nullptr;
for (; I != CurrInternalNode->Children.end(); ++I) {
LastChild = I->second;
ToVisit.push_back(LastChild);
}
ChildrenMap[CurrInternalNode] = {FirstChild, LastChild};
}
}
```
https://github.com/llvm/llvm-project/pull/90275
More information about the llvm-commits
mailing list