[PATCH] D57779: [SLP] Add support for throttling.

Alexey Bataev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Jun 2 10:31:16 PDT 2019


ABataev added inline comments.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:126
+static cl::opt<int>
+    MaxCostsRecalculations("slp-throttling-budget", cl::init(64), cl::Hidden,
+                           cl::desc("Limit the total number of nodes for cost "
----------------
Why 64? Did you try to estimate effect of this change on the compilation time?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:653
 
+  /// Estimate the subtree cut of the whole tree for partial vectorization
+  /// not just from a cost perspective.
----------------
What is criteria of the estimation? 


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1409
 
+  /// \Returns the cost of extracting the vectorized elements.
+  int getExtractOperationCost(ExternalUser &EU);
----------------
`\returns`


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2059
             assert(!UseEntry->NeedToGather && "Bad state");
+            InternalTreeUses[U].push_back(ExternalUser(Scalar, U, FoundLane));
             continue;
----------------
Use `emplace_back` instead.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2665
 
+bool BoUpSLP::isGoodTreeCut(ArrayRef<unsigned> VecNodes) {
+  // Find the first entry with worth operations to vectorize.
----------------
Function can be made `const`


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2671
+    InstructionsState S = getSameOpcode(VectorizableTree[N]->Scalars);
+    auto *Inst = S.MainOp;
+    if (Inst && (isa<BinaryOperator>(Inst) || isa<FPMathOperator>(Inst) ||
----------------
`auto`-> to real type.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2699
+            std::remove_if(ExternalUses.begin(), ExternalUses.end(),
+                           [&](ExternalUser &EU) { return EU.Scalar == V; }),
+            ExternalUses.end());
----------------
Explicitly specify list of captures


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2711
+      continue;
+    if (is_contained(VecNodes, I)) {
+      for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
----------------
I think, the outer loop must iterate through elements of `VecNodes`. In this case, you won't need need this check.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2716
+          LLVM_DEBUG(dbgs() << "SLP: Checking user:" << *U << ".\n");
+          Instruction *UserInst = dyn_cast<Instruction>(U);
+          if (!UserInst)
----------------
`Instruction *`->`auto *`


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2727
+                     << " from lane " << Lane << " from " << *Scalar << ".\n");
+          ExternalUses.push_back(ExternalUser(Scalar, U, Lane));
+        }
----------------
`emplace_back`


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2737
+  bool Changed = false;
+  for (unsigned I = 0; I < Seeds.size(); ++I) {
+    auto &S = *Seeds[I].get();
----------------
Can you use range loop here?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2738
+  for (unsigned I = 0; I < Seeds.size(); ++I) {
+    auto &S = *Seeds[I].get();
+    // Check those seed instructions are still alive.
----------------
Convert auto to real type.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2740
+    // Check those seed instructions are still alive.
+    if (std::any_of(S.begin(), S.end(), [](Value *V) {
+          return (!(cast<Instruction>(V))->getOperand(0));
----------------
Use `llvm::any_of`


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2781
   uint64_t VTSize = DL.getTypeStoreSizeInBits(VectorType::get(EltTy, N));
-  if (VTSize < MinVecRegSize || VTSize > MaxVecRegSize || VTSize != DL.getTypeStoreSizeInBits(T))
+  if (VTSize < MinVecRegSize || VTSize > MaxVecRegSize ||
+      VTSize != DL.getTypeStoreSizeInBits(T))
----------------
Restore original code


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2805
 
-  // We have to extract from a vector/aggregate with the same number of elements.
+  // We have to extract from a vector/aggregate with the same number of
+  // elements.
----------------
Restore original


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3284
 
-  SmallPtrSet<Instruction*, 4> LiveValues;
+  SmallPtrSet<Instruction *, 4> LiveValues;
   Instruction *PrevInst = nullptr;
----------------
Restore original


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3326
           &*PrevInstIt != PrevInst) {
-        SmallVector<Type*, 4> V;
+        SmallVector<Type *, 4> V;
         for (auto *II : LiveValues)
----------------
Restore original


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3341
 
-int BoUpSLP::getTreeCost() {
-  int Cost = 0;
+int BoUpSLP::getExtractOperationCost(ExternalUser &EU) {
+  unsigned BundleWidth = VectorizableTree.front()->Scalars.size();
----------------
`const` function?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3354
+  auto *VecTy = VectorType::get(EU.Scalar->getType(), BundleWidth);
+  auto *ScalarRoot = VectorizableTree[0]->Scalars[0];
+
----------------
`auto`->to real type


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3358
+    auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);
+    auto Extend =
+        MinBWs[ScalarRoot].second ? Instruction::SExt : Instruction::ZExt;
----------------
`auto`->to real type


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3367
+
+int BoUpSLP::getExtractCost() {
+  int ExtractCost = 0;
----------------
`const` function?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3370
+  SmallPtrSet<Value *, 16> ExtractCostCalculated;
+  for (ExternalUser &EU : ExternalUses) {
+    // We only add extract cost once for the same scalar.
----------------
`const ExternalUser &`


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3390
+
+int BoUpSLP::getInsertCost() {
+  int InsertCost = 0;
----------------
`const` function


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3408
+
+void BoUpSLP::treeTraversal(SmallVectorImpl<unsigned> &Leaves,
+                            SmallVectorImpl<unsigned> &Branches) {
----------------
`const` function?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3410
+                            SmallVectorImpl<unsigned> &Branches) {
+  std::vector<unsigned> Worklist;
+  SmallSet<unsigned, 8> Visited;
----------------
Why `std::vector`?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3429
+      if (!Visited.count(Next)) {
+        NonGatherUse++;
+        Worklist.push_back(Next);
----------------
Use preincrement


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3449
+    Path.pop_back();
+    auto I = std::find(VecNodes.begin(), VecNodes.end(), N);
+    assert(I != VecNodes.end() && "Wrong node");
----------------
Use `llvm::find`


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3478
+  // from a particular branch.
+  for (unsigned N = 0, E = VectorizableTree.size(); N < E; ++N) {
+    if (VectorizableTree[N]->NeedToGather)
----------------
Iterate through `Leaves`


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3514
+            continue;
+          auto I = std::find(Leaves.begin(), Leaves.end(), L1);
+          if (I == Leaves.end())
----------------
Use `llvm::find`


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3529
+        Path.push_back(Node);
+      if (!Node)
+        break;
----------------
`while (!Node);`?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3533
+
+    std::reverse(Path.begin(), Path.end());
+    CostsRecalculations += Path.size();
----------------
Are you sure you need to do this on each iteration?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3572
 
-    int C = getEntryCost(&TE);
-    LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
+    for (auto &UserTreeIdx : TE.UserTreeIndices)
+      UseTreeIndices[UserTreeIdx.UserTE->Idx].push_back(I);
----------------
`auto` to real type


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3596
 
   std::string Str;
+  if (CutTree && Cost >= -SLPCostThreshold) {
----------------
Why `std::string` and why declared here?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3617
+    }
+  } else {
+    {
----------------
One group of braces can be removed


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4481
+        // check every user.
+        if (!RemovedOperations.size())
+          for (User *U : Scalar->users()) {
----------------
Use `empty()`


================
Comment at: test/Transforms/SLPVectorizer/AArch64/horizontal.ll:58
+; CHECK-NEXT:    [[TMP12:%.*]] = insertelement <2 x i32> [[TMP11]], i32 1, i32 1
+; CHECK-NEXT:    [[TMP13]] = add nsw <2 x i32> [[TMP12]], [[TMP0]]
+; CHECK-NEXT:    [[TMP14:%.*]] = extractelement <2 x i32> [[TMP13]], i32 1
----------------
Does not seem to me we should vectorize this. 2 add ops have different attributes: one has just `nsw`, another one `nuw nsw`.
Plus, 2 inserts+vec add+2 extracts does not look more beneficial than 2 scalar adds


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57779/new/

https://reviews.llvm.org/D57779





More information about the llvm-commits mailing list