[PATCH] D57779: [SLP] Add support for throttling.
Alexey Bataev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Jun 2 10:31:16 PDT 2019
ABataev added inline comments.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:126
+static cl::opt<int>
+ MaxCostsRecalculations("slp-throttling-budget", cl::init(64), cl::Hidden,
+ cl::desc("Limit the total number of nodes for cost "
----------------
Why 64? Did you try to estimate effect of this change on the compilation time?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:653
+ /// Estimate the subtree cut of the whole tree for partial vectorization
+ /// not just from a cost perspective.
----------------
What is criteria of the estimation?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1409
+ /// \Returns the cost of extracting the vectorized elements.
+ int getExtractOperationCost(ExternalUser &EU);
----------------
`\returns`
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2059
assert(!UseEntry->NeedToGather && "Bad state");
+ InternalTreeUses[U].push_back(ExternalUser(Scalar, U, FoundLane));
continue;
----------------
Use `emplace_back` instead.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2665
+bool BoUpSLP::isGoodTreeCut(ArrayRef<unsigned> VecNodes) {
+ // Find the first entry with worth operations to vectorize.
----------------
Function can be made `const`
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2671
+ InstructionsState S = getSameOpcode(VectorizableTree[N]->Scalars);
+ auto *Inst = S.MainOp;
+ if (Inst && (isa<BinaryOperator>(Inst) || isa<FPMathOperator>(Inst) ||
----------------
`auto`-> to real type.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2699
+ std::remove_if(ExternalUses.begin(), ExternalUses.end(),
+ [&](ExternalUser &EU) { return EU.Scalar == V; }),
+ ExternalUses.end());
----------------
Explicitly specify list of captures
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2711
+ continue;
+ if (is_contained(VecNodes, I)) {
+ for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
----------------
I think, the outer loop must iterate through elements of `VecNodes`. In this case, you won't need need this check.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2716
+ LLVM_DEBUG(dbgs() << "SLP: Checking user:" << *U << ".\n");
+ Instruction *UserInst = dyn_cast<Instruction>(U);
+ if (!UserInst)
----------------
`Instruction *`->`auto *`
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2727
+ << " from lane " << Lane << " from " << *Scalar << ".\n");
+ ExternalUses.push_back(ExternalUser(Scalar, U, Lane));
+ }
----------------
`emplace_back`
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2737
+ bool Changed = false;
+ for (unsigned I = 0; I < Seeds.size(); ++I) {
+ auto &S = *Seeds[I].get();
----------------
Can you use range loop here?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2738
+ for (unsigned I = 0; I < Seeds.size(); ++I) {
+ auto &S = *Seeds[I].get();
+ // Check those seed instructions are still alive.
----------------
Convert auto to real type.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2740
+ // Check those seed instructions are still alive.
+ if (std::any_of(S.begin(), S.end(), [](Value *V) {
+ return (!(cast<Instruction>(V))->getOperand(0));
----------------
Use `llvm::any_of`
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2781
uint64_t VTSize = DL.getTypeStoreSizeInBits(VectorType::get(EltTy, N));
- if (VTSize < MinVecRegSize || VTSize > MaxVecRegSize || VTSize != DL.getTypeStoreSizeInBits(T))
+ if (VTSize < MinVecRegSize || VTSize > MaxVecRegSize ||
+ VTSize != DL.getTypeStoreSizeInBits(T))
----------------
Restore original code
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2805
- // We have to extract from a vector/aggregate with the same number of elements.
+ // We have to extract from a vector/aggregate with the same number of
+ // elements.
----------------
Restore original
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3284
- SmallPtrSet<Instruction*, 4> LiveValues;
+ SmallPtrSet<Instruction *, 4> LiveValues;
Instruction *PrevInst = nullptr;
----------------
Restore original
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3326
&*PrevInstIt != PrevInst) {
- SmallVector<Type*, 4> V;
+ SmallVector<Type *, 4> V;
for (auto *II : LiveValues)
----------------
Restore original
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3341
-int BoUpSLP::getTreeCost() {
- int Cost = 0;
+int BoUpSLP::getExtractOperationCost(ExternalUser &EU) {
+ unsigned BundleWidth = VectorizableTree.front()->Scalars.size();
----------------
`const` function?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3354
+ auto *VecTy = VectorType::get(EU.Scalar->getType(), BundleWidth);
+ auto *ScalarRoot = VectorizableTree[0]->Scalars[0];
+
----------------
`auto`->to real type
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3358
+ auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);
+ auto Extend =
+ MinBWs[ScalarRoot].second ? Instruction::SExt : Instruction::ZExt;
----------------
`auto`->to real type
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3367
+
+int BoUpSLP::getExtractCost() {
+ int ExtractCost = 0;
----------------
`const` function?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3370
+ SmallPtrSet<Value *, 16> ExtractCostCalculated;
+ for (ExternalUser &EU : ExternalUses) {
+ // We only add extract cost once for the same scalar.
----------------
`const ExternalUser &`
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3390
+
+int BoUpSLP::getInsertCost() {
+ int InsertCost = 0;
----------------
`const` function
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3408
+
+void BoUpSLP::treeTraversal(SmallVectorImpl<unsigned> &Leaves,
+ SmallVectorImpl<unsigned> &Branches) {
----------------
`const` function?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3410
+ SmallVectorImpl<unsigned> &Branches) {
+ std::vector<unsigned> Worklist;
+ SmallSet<unsigned, 8> Visited;
----------------
Why `std::vector`?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3429
+ if (!Visited.count(Next)) {
+ NonGatherUse++;
+ Worklist.push_back(Next);
----------------
Use preincrement
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3449
+ Path.pop_back();
+ auto I = std::find(VecNodes.begin(), VecNodes.end(), N);
+ assert(I != VecNodes.end() && "Wrong node");
----------------
Use `llvm::find`
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3478
+ // from a particular branch.
+ for (unsigned N = 0, E = VectorizableTree.size(); N < E; ++N) {
+ if (VectorizableTree[N]->NeedToGather)
----------------
Iterate through `Leaves`
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3514
+ continue;
+ auto I = std::find(Leaves.begin(), Leaves.end(), L1);
+ if (I == Leaves.end())
----------------
Use `llvm::find`
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3529
+ Path.push_back(Node);
+ if (!Node)
+ break;
----------------
`while (!Node);`?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3533
+
+ std::reverse(Path.begin(), Path.end());
+ CostsRecalculations += Path.size();
----------------
Are you sure you need to do this on each iteration?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3572
- int C = getEntryCost(&TE);
- LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
+ for (auto &UserTreeIdx : TE.UserTreeIndices)
+ UseTreeIndices[UserTreeIdx.UserTE->Idx].push_back(I);
----------------
`auto` to real type
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3596
std::string Str;
+ if (CutTree && Cost >= -SLPCostThreshold) {
----------------
Why `std::string` and why declared here?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3617
+ }
+ } else {
+ {
----------------
One group of braces can be removed
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4481
+ // check every user.
+ if (!RemovedOperations.size())
+ for (User *U : Scalar->users()) {
----------------
Use `empty()`
================
Comment at: test/Transforms/SLPVectorizer/AArch64/horizontal.ll:58
+; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x i32> [[TMP11]], i32 1, i32 1
+; CHECK-NEXT: [[TMP13]] = add nsw <2 x i32> [[TMP12]], [[TMP0]]
+; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP13]], i32 1
----------------
Does not seem to me we should vectorize this. 2 add ops have different attributes: one has just `nsw`, another one `nuw nsw`.
Plus, 2 inserts+vec add+2 extracts does not look more beneficial than 2 scalar adds
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D57779/new/
https://reviews.llvm.org/D57779
More information about the llvm-commits
mailing list