[llvm] [SLP]Add external uses estimations into tree throttling (PR #178024)

Alexey Bataev via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 27 12:07:51 PST 2026


================
@@ -16487,8 +16490,49 @@ InstructionCost BoUpSLP::calculateTreeCostAndTrimNonProfitable(
   SmallDenseMap<const TreeEntry *, InstructionCost> NodesCosts;
   SmallPtrSet<Value *, 4> CheckedExtracts;
   SmallSetVector<TreeEntry *, 4> GatheredLoadsNodes;
+  SmallDenseMap<const TreeEntry *, InstructionCost> ExtractCosts;
   LLVM_DEBUG(dbgs() << "SLP: Calculating cost for tree of size "
                     << VectorizableTree.size() << ".\n");
+  auto IsExternallyUsed = [&](const TreeEntry &TE, Value *V) {
+    assert(TE.hasState() && !TE.isGather() &&
+           TE.State != TreeEntry::SplitVectorize && "Expected vector node.");
+    if (V->hasOneUse() || V->hasNUses(0) || V->getType()->isVoidTy())
+      return false;
+    if (TE.hasCopyableElements() && TE.isCopyableElement(V))
+      return false;
+    const size_t NumVectScalars = ScalarToTreeEntries.size() + 1;
+    if (V->hasNUsesOrMore(NumVectScalars))
+      return true;
+    auto *I = dyn_cast<Instruction>(V);
+    // Check if any user is used outside of the tree.
+    return I && any_of(I->users(), [&](const User *U) {
+             // store/insertelt v, [cast]U will likely be vectorized.
+             if (match(U, m_InsertElt(m_Value(),
+                                      m_OneUse(m_CastOrSelf(m_Specific(I))),
+                                      m_ConstantInt())))
+               return false;
+             if (match(U,
+                       m_InsertElt(m_Value(), m_Specific(I), m_ConstantInt())))
+               return false;
+             if (match(U, m_Store(m_OneUse(m_CastOrSelf(m_Specific(I))),
+                                  m_Value())))
+               return false;
+             if (match(U, m_Store(m_Specific(I), m_Value())))
+               return false;
+             ArrayRef<TreeEntry *> Entries = getTreeEntries(U);
+             if (Entries.empty() && !MustGather.contains(U))
+               return true;
+             if (any_of(Entries, [&](TreeEntry *TE) {
+                   return DeletedNodes.contains(TE);
+                 }))
+               return true;
+             return any_of(ValueToGatherNodes.lookup(U),
+                           [&](const TreeEntry *TE) {
+                             return DeletedNodes.contains(TE);
+                           });
+           });
+  };
----------------
alexey-bataev wrote:

> Could we improve by looking across all elements at once? For instance, say we have the case:
> 
> ```
> %0 = load <8 x i8> ptr %p0
> %1 = add <8 x i8> %0, %0
> store <8 x i8> %1, ptr %p1
> %2 = extract_element <8 x i8> %0,  i32 2
> store i8 %2, ptr %p2
> ```
> 
> This current heuristic would consider that `extract/store` as likely vectorizable, even though when we look at the broader context and see that it is the only element being extracted/stored, it's clear that it won't be vectorized.
> 
> Not sure if this would be feasible for this use case of a lightweight heuristic.

That's why I consider it as only "estimation", not a real cost. This is just a quick heuristic, which should not affect compile time too much. Calculating actual extracting cost requires more recalculations in the external uses (since the tree is modified on the flight) and cost recalculation, which I want to avoid.

https://github.com/llvm/llvm-project/pull/178024


More information about the llvm-commits mailing list