[llvm] [SLP] Improve block traversal in getSpillCost() (PR #128620)
Philip Reames via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 25 08:26:24 PST 2025
================
@@ -12415,68 +12415,48 @@ InstructionCost BoUpSLP::getSpillCost() {
InstructionCost Cost = 0;
SmallPtrSet<const TreeEntry *, 4> LiveEntries;
- const TreeEntry *Prev = nullptr;
-
- // The entries in VectorizableTree are not necessarily ordered by their
- // position in basic blocks. Collect them and order them by dominance so later
- // instructions are guaranteed to be visited first. For instructions in
- // different basic blocks, we only scan to the beginning of the block, so
- // their order does not matter, as long as all instructions in a basic block
- // are grouped together. Using dominance ensures a deterministic order.
- SmallVector<TreeEntry *, 16> OrderedEntries;
- for (const auto &TEPtr : VectorizableTree) {
- if (TEPtr->isGather())
- continue;
- OrderedEntries.push_back(TEPtr.get());
- }
- llvm::stable_sort(OrderedEntries, [&](const TreeEntry *TA,
- const TreeEntry *TB) {
- Instruction &A = getLastInstructionInBundle(TA);
- Instruction &B = getLastInstructionInBundle(TB);
- auto *NodeA = DT->getNode(A.getParent());
- auto *NodeB = DT->getNode(B.getParent());
- assert(NodeA && "Should only process reachable instructions");
- assert(NodeB && "Should only process reachable instructions");
- assert((NodeA == NodeB) == (NodeA->getDFSNumIn() == NodeB->getDFSNumIn()) &&
- "Different nodes should have different DFS numbers");
- if (NodeA != NodeB)
- return NodeA->getDFSNumIn() > NodeB->getDFSNumIn();
- return B.comesBefore(&A);
- });
- for (const TreeEntry *TE : OrderedEntries) {
- if (!Prev) {
- Prev = TE;
+ const TreeEntry *Root = VectorizableTree.front().get();
+ BasicBlock *RootBB = cast<Instruction>(Root->Scalars[0])->getParent();
+
+ // Compute what nodes are reachable from the leaves to the roots
+ df_iterator_default_set<const BasicBlock *> ReachableFromLeaves;
+ for (auto &TE : VectorizableTree) {
+ if (TE->isGather())
continue;
- }
+ auto *BB = getLastInstructionInBundle(TE.get()).getParent();
+ for (const BasicBlock *X : depth_first_ext(BB, ReachableFromLeaves))
+ ReachableFromLeaves.insert(X);
+ }
- LiveEntries.erase(Prev);
- for (unsigned I : seq<unsigned>(Prev->getNumOperands())) {
- const TreeEntry *Op = getVectorizedOperand(Prev, I);
- if (!Op)
- continue;
- assert(!Op->isGather() && "Expected vectorized operand.");
- LiveEntries.insert(Op);
- }
+ DenseSet<const BasicBlock *> Reachable;
----------------
preames wrote:
Algorithmically, this doesn't look quite right. You're computing the set of blocks reachable from all defs to the root. What you actually want to have explored is the set of blocks reachable between each def and it's respective set of uses. Your set the union of all of the smaller sets, but we want to respect liveness though the graph.
I think you're going to need to precompute the number of live values per block (and transition points within blocks) given the def/use information. Once you have that, I'd suggest walking the CFG in the existing loop below (i.e. change the early exit at end of block case), and *cache* don't *precompute* which blocks have been scanned so far. In particular, cache the number of calls per block which is the primary thing you need at that point.
@alexey-bataev is there some invariant on the graph which simplifies this in any way? Do we statically know any properties of the CFG or vector tree which allow us to avoid full general liveness here?
https://github.com/llvm/llvm-project/pull/128620
More information about the llvm-commits
mailing list