[PATCH] D132590: [SLP] Try to match reductions first in a vector build sequence.
Alexey Bataev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 25 15:37:52 PDT 2022
ABataev added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:11820-11823
+ SmallVector<WeakTrackingVH> PostponedInsts;
+ for (Value *Op : BuildVectorOpds)
+ OpsChanged |=
+ vectorizeHorReduction(nullptr, Op, BB, R, TTI, PostponedInsts);
----------------
vdmitrie wrote:
> ABataev wrote:
> > vdmitrie wrote:
> > > ABataev wrote:
> > > > vdmitrie wrote:
> > > > > ABataev wrote:
> > > > > > vdmitrie wrote:
> > > > > > > ABataev wrote:
> > > > > > > > Why do you want still call this after `findBuildAggregate`?
> > > > > > > In order to match hor reduction on build vector operands. Can you offer other approach to do the same?
> > > > > > But vectorizeRootInstruction already does this, why do you want to do it twice, here and in vectorizeRootInstruction? It increases compile time.
> > > > > > Can we just call vectorizeRootInstruction(nullptr, I, BB, R, TTI); and vectorizeBuildVector after? And remove this loop and final OpsChanged |= tryToVectorize(PostponedInsts, R); call?
> > > > > > But vectorizeRootInstruction already does this,
> > > > >
> > > > > Nope. This is exactly what does not happen after your commit https://reviews.llvm.org/rG7ea03f0b4e4e
> > > > > Before the commit we traversed through operands of insertelements when collected instructions for work stack and were able to locate all reductions at once.
> > > > >
> > > > > > Can we just call vectorizeRootInstruction(nullptr, I, BB, R, TTI); and vectorizeBuildVector after?
> > > > >
> > > > > Nope. vectorizeRootInstruction may too early create 2-way vectorizations. We only want to call it after trying on with wider VFs with tryToVectorizeList.
> > > > >
> > > > >
> > > > Then you don't need to call vectorizeRootInstruction. Also, why do you need to find operands of the buildvector sequence? Just call vectorizeHorReduction for each insert instruction in the loop and in the second loop call findBuildAggregate and all other stuff for buildvector and postponed insts.
> > > > Just call vectorizeHorReduction for each insert ...
> > >
> > > I don't like this approach. I'd rather enable vectorizeHorReduction to traverse through insertelement operands. BTW you did not explain why you suppressed that.
> > >
> > To save compile time, to avoid analysis of the same instructions several times.
> > Why you don't like it?
> > To save compile time, to avoid analysis of the same instructions several times.
> > Why you don't like it?
>
> You are probably talking about https://reviews.llvm.org/D114171 when saying "compile time". But I'm about the different change. Note that the patch that was last reviewed in https://reviews.llvm.org/D114171
> does not match what was actually committed (with the reference to differential revision). I'm specifically concerned about this piece of code:
>
> ```
> // Try to vectorize operands.
> // Continue analysis for the instruction from the same basic block only to
> // save compile time.
> if (++Level < RecursionMaxDepth)
> for (auto *Op : Inst->operand_values())
> if (VisitedInstrs.insert(Op).second)
> if (auto *I = dyn_cast<Instruction>(Op))
> // Do not try to vectorize CmpInst operands, this is done
> // separately.
> if (!isa<PHINode>(I) && !isa<CmpInst>(I) && !R.isDeleted(I) &&
> I->getParent() == BB)
> Stack.emplace(I, Level);
> ```
>
> where you changed to avoid traversing through insertelement operands:
> ```
> if (!isa<PHINode, CmpInst, InsertElementInst, InsertValueInst>(I) &&
> ```
> I'm not convinced that this specific change worth extra compile time.
>
> I can agree that sticking to buildvector is not the best approach, but not traversing the operands early does not save compile time. It only leads to postponing the action for another visit but with much lower chances for vectorizing it the right way.
>
>
It was changed after several comments regarding compile time.
There are 2 problems with the original code.
1. Compile time in case of unsuccessful attempts. We repeat analysis of same instructions at least twice, if not more.
2. It does not preseve the analysis order for postponable instructions, some of them gets analyzed too early. It breaks internal logic, makes it harder for the perf abd bug analysis. And may lead to a too early vectorization.
The problem that insertelement is also the operand and it maybe a part of the another build vector sequence. In case of too early vectorization attempt we may end up with 2 x vectorization of the operand of the insertelement ibstruction instead of possible wider buildvector sequence.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D132590/new/
https://reviews.llvm.org/D132590
More information about the llvm-commits
mailing list