[llvm] [SLP] Check for extracts, being replaced by original scalars, for user nodes (PR #149572)

Tue Aug 5 09:50:32 PDT 2025

alexey-bataev wrote:

> Thanks for taking the time to answer my questions.
> 
> > > I still think the cost checks are too "local" to really answer the question "Is this worth vectorising if some lanes need to be kept as scalars?". It's not clear to me if looking at a single user `TreeEntry` is a good enough heuristic.
> > 
> > 
> > TreeEntries always has single user (by design)
> 
> Sorry it wasn't entirely clear, I understand that TreeEntries have a single user because of the way the SLPVectorizer works. What I meant is: if we want to estimate the scalarization cost of operands because some lanes down the tree need to be extracted, then we need to look at more that just one user node. We also need to look at the user of that user node, and so on, until we find the node that has decided to scalarize some lanes.

No, we don't need it. This patch looks at the node and its user and checks the user node, if the sacalars should be extracted. Previous users were checked already on the previous iteration.

> Essentially, we would need to find the whole dependence tree that needs to be scalarized. Now, I understand this is expensive, that's why I suggested to do that after the whole tree is built. At that point we would be able to properly estimate the scalarization cost, and we can prune the tree if needed.
> 

In general, yes. But not necessary. It will explode compile time + won't give significant benefits. SLP vectorizer uses greedy approach, so all the analysis tries to implement same approach to avoid compile time regressions.

> > > If a TreeEntry turns out to be too costly because it is both partially scalarised and vectorised, then we can replace it with a Gather node, and essentially prune the tree.
> > 

This is another approach, I have a plan for. It is called SLP throttling. There is a ticket to support this, but it will require lots of time to implement.
The approach, implemented in this patch, is much simpler and faster, it just postpones potentially expensive nodes and then vectorizes them in `transformNodes()` pass.

> > 
> > It is too late already for making the decision.
> 
> Why is that? We do plenty of optimisations once the tree is built.

Yes, but as I said, it is completely different and more expensive approach

https://github.com/llvm/llvm-project/pull/149572