[PATCH] D57779: [SLP] Add support for throttling.
Alexey Bataev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 17 08:40:36 PDT 2019
ABataev added inline comments.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:645
+ /// Save seed instructions to try partially vectorize later.
+ void recordSeeds(ArrayRef<Value *> Ops);
+
----------------
dtemirbulatov wrote:
> dtemirbulatov wrote:
> > dtemirbulatov wrote:
> > > vporpo wrote:
> > > > dtemirbulatov wrote:
> > > > > vporpo wrote:
> > > > > > Why are we collecting the seeds specifically for partial vectorization? Is this really needed? Why don't you just call tryPartialVectorization(Seeds) within vectorizeStorChain() etc. ?
> > > > > Ok, Imagine that we could do partial vectorization with vector size 4 for 30% of the tree, but for the same tree, we could have full vectorization with vector size 2. I think that it would be serious regression if we could do it with just 30%. Or for example, for the same tree, we could do reduction later for the whole tree.
> > > > Yes, but this problem is not specific to throttling. As it is now, SLP will greedily accept 4-wide vectorization if profitable, without comparing it against 2x 2-wide. I think this problem should be addressed separately.
> > > Let's change that behavior separately or this change grows even further. BTW, I measured the SLP run-time change on SPEC2k6 compilation and it is about ~10% on average.
> > and It is not clear to me how to compare the benefit of one vectorization against another, the same example we achieved 50% of tree 4 wide vectorization with still profitable cost, let's say, -1. But, for the same tree, we have 90% of vectorization with vector size 2 with for example the same cost. We could not say let's pick one with the highest score, we are interesting to vectorize while it is profitable.
> Maybe we could have a subjective score based upon a vectorized tree-hight and vector widening.
I agree with Vasileios here, we should follow the same approach as general SLP vectorization. Otherwise, the compile time increases significantly.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1303
+ /// The index containing the use of this entry.
+ SmallVector<TreeEntry *, 1> UseTreeIndices;
+
----------------
`TinyPtrVector<TreeEntry *>`?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1416
+ // Find nodes with more than one use.
+ bool isBranch() {
+ return llvm::count_if(UseTreeIndices, [this](TreeEntry *Next) {
----------------
const member function.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1473-1474
Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());
Last->NeedToGather = !Vectorized;
+ Last->ProposedToGather = !Vectorized;
Last->ReuseShuffleIndices.append(ReuseShuffleIndices.begin(),
----------------
Maybe, it is better to merge these 2 flags into one and use enumeric instead of booleans? Seems to me, these flags are mutually exclusive.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2920
+ SmallVector<TreeEntry *, 4> VecNodes;
+ for (auto &TEPtr : VectorizableTree) {
+ TreeEntry *Entry = TEPtr.get();
----------------
`auto`->real type
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2927-2933
+ auto It = llvm::find_if(VecNodes, [](TreeEntry *E) {
+ Instruction *Inst = E->getMainOp();
+ return (Inst && (isa<BinaryOperator>(Inst) || isa<FPMathOperator>(Inst) ||
+ isa<CmpInst>(Inst)));
+ });
+ if (It == VecNodes.end())
+ return false;
----------------
You can do this analysis inside the first loop, no?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2974
+ << ".\n");
+ ExternalUses.emplace_back(ExternalUser(Scalar, U, Lane));
+ }
----------------
Remove call of `ExternalUser` constructor, just `ExternalUses.emplace_back(Scalar, U, Lane);`
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3004-3005
+ // decisions.
+ if (Cost.first < -SLPCostThreshold)
+ continue;
+ for (auto &TEPtr : VectorizableTree) {
----------------
I think it must be an assert, that `Cost.first >= -SLPCostThreshold` here.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3006
+ continue;
+ for (auto &TEPtr : VectorizableTree) {
+ TreeEntry *TE = TEPtr.get();
----------------
`auto`->real type
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3018-3024
+ SmallString<256> Str;
+ raw_svector_ostream OS(Str);
+ OS << "SLP: Partial vectorization with Total Cost = "
+ << PartialCost.getValue() << ".\n";
+ LLVM_DEBUG(dbgs() << Str);
+ if (ViewSLPTree)
+ ViewGraph(this, "SLP" + F->getName(), false, Str);
----------------
All this code must be compiled only in the debug mode.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3027
+
+ if (PartialCost.hasValue() && PartialCost.getValue() < -SLPCostThreshold) {
+ LLVM_DEBUG(dbgs() << "SLP: Decided to partially vectorize with cost="
----------------
Merge this `if` with the previous one.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3696
+ for (TreeEntry *N : Path) {
+ CostsRecalculations++;
+
----------------
Missed the check for `CostsRecalculations >= MaxCostsRecalculations` here
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3710
+ if (PartialCost < -SLPCostThreshold && cutTree())
+ return PartialCost;
+ }
----------------
Maybe transform the function into return `bool` and return the cost in `Cost` parameter, passed by reference?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3718
+ int NonGatherUse;
+ if (!is_contained(Path, C))
+ Path.push_back(C);
----------------
`is_contained` is used several times and not effective. Maybe you can use a set here instead of SmallVector?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D57779/new/
https://reviews.llvm.org/D57779
More information about the llvm-commits
mailing list