[PATCH] D133441: [SLP] Look ahead for mutual horizontal reductions.

Thu Sep 8 05:16:55 PDT 2022

labrinea added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:11035-11038
+    for (ArrayRef<Value *> RdxOps : ReductionOps)
+      for (Value *RdxOp : RdxOps)
+        if (RdxOp)
+          V.addHorizontalReduction(RdxOp);
----------------
ABataev wrote:
> labrinea wrote:
> > ABataev wrote:
> > > Not sure it is a good idea to store reduction ops here, you're working with reduced values actually.
> > If you look at the reproducer:
> > 
> > ```
> > for i = ...
> >   sm += x[i];
> >   sq += x[i] * x[i];
> > ```
> > the addition `sm += x[i]` (which is a reduction op of the first horizontal reduction) is an external user of the scalar load from the multiplication `x[i] * x[i]` (which is a reduction value of the second horizontal reduction)
> Aha, this is for external users. Then why do you need to store there reduced values?
Because similarly the multiplication (reduced value) is an external user of the scalar load from the other horizontal reduction.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:11778
   SmallPtrSet<Value *, 8> VisitedInstrs;
+  Optional<HorizontalReduction> PendingReduction = None;
   bool Res = false;
----------------
ABataev wrote:
> labrinea wrote:
> > ABataev wrote:
> > > Why do you need this one? In case of successful reduction, the vectorizer restarts the analysis and rebuilds the reduction graph.
> > The idea is to separate the "matching" of a horizontal reduction (matchAssociativeReduction) from the "processing" of it (tryToReduce). That way we can postpone the processing until we have found at least another one. This allows us to identify mutual reductions and ignore the extraction cost of the common scalar values. As Vasilis mentioned this limits the amount of mutual reductions to two, which is not ideal.
> No need to postpone it, the pass will just repeat the same steps once again after the changes.
I am not following. What should the code look like then? How can we solve the problem of mutual reductions without looking ahead? PendingReduction serves the purpose of one element buffer if that makes it clearer.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133441/new/

https://reviews.llvm.org/D133441