[llvm] [MachineOutliner] Leaf Descendants (PR #90275)

Kyungwoo Lee via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 7 16:24:34 PDT 2024


================
@@ -105,6 +114,68 @@ void SuffixTree::setSuffixIndices() {
   }
 }
 
+void SuffixTree::setLeafNodes() {
+  // A stack that keeps track of nodes to visit for post-order DFS traversal.
+  std::stack<SuffixTreeNode *> ToVisit;
+  ToVisit.push(Root);
+
+  // This keeps track of the index of the next leaf node to be added to
+  // the LeafNodes vector of the suffix tree.
+  unsigned LeafCounter = 0;
+
+  // This keeps track of nodes whose children have been added to the stack
+  // during the post-order depth-first traversal of the tree.
+  llvm::SmallPtrSet<SuffixTreeInternalNode *, 32> ChildrenAddedToStack;
+
+  // Traverse the tree in post-order.
+  while (!ToVisit.empty()) {
+    SuffixTreeNode *CurrNode = ToVisit.top();
+    ToVisit.pop();
+    if (auto *CurrInternalNode = dyn_cast<SuffixTreeInternalNode>(CurrNode)) {
+      // The current node is an internal node.
+      if (ChildrenAddedToStack.find(CurrInternalNode) !=
+          ChildrenAddedToStack.end()) {
+        // If the children of the current node has been added to the stack,
+        // then this is the second time we visit this node and at this point,
+        // all of its children have already been processed. Now, we can
+        // set its LeftLeafIdx and RightLeafIdx;
+        auto it = CurrInternalNode->Children.begin();
+        if (it != CurrInternalNode->Children.end()) {
+          // Get the first child to use its RightLeafIdx. The RightLeafIdx is
+          // used as the first child is the initial one added to the stack, so
+          // it's the last one to be processed. This implies that the leaf
+          // descendants of the first child are assigned the largest index
+          // numbers.
----------------
kyulee-com wrote:

Yeah. Using `MapVector` is an option, but it will be expensive for a large suffix tree. Given `setLeafNodes()` is the last step after the suffix tree is frozen while this step is read-only, I think it's okay. However, it's still clunky to iterate `Children` once for the first visit (below the `else` part) and again here in the second visit.

So, instead of `ChildrenAddedToStack`, if you define `DenseMap<SuffixTreeInternalNode *, std::pair<SuffixTreeNode *, SuffixTreeNode *>> ChildrenMap;`  to track on the first/last child in the first visit, and use it in the second visit here. Something like:
```
      auto I = ChildrenMap.find(CurrInternalNode);
      if (I != ChildrenMap.end()) {
        // Node's children have already been processed
        auto [FirstChild, LastChild] = I->second;
        CurrNode->setRightLeafIdx(FirstChild->getRightLeafIdx());
        CurrNode->setLeftLeafIdx(LastChild->getLeftLeafIdx());
        assert(CurrNode->getLeftLeafIdx() <= CurrNode->getRightLeafIdx() &&
               "LeftLeafIdx should not be larger than RightLeafIdx");
      } else {
        // This is the first time we visit this node. This means that its
        // children have not been added to the stack yet. Hence, we will add
        // the current node back to the stack and add its children to the
        // stack for processing.
        auto I = CurrInternalNode->Children.begin();
        if (I != CurrInternalNode->Children.end()) {
          ToVisit.push_back(CurrNode);
          SuffixTreeNode *FirstChild = I->second;
          SuffixTreeNode *LastChild = nullptr;
          for (; I != CurrInternalNode->Children.end(); ++I) {
            LastChild = I->second;
            ToVisit.push_back(LastChild);
          }
          ChildrenMap[CurrInternalNode] = {FirstChild, LastChild};
        }
      }
```


https://github.com/llvm/llvm-project/pull/90275


More information about the llvm-commits mailing list