[llvm] df3b95b - [SLP][NFC] PR45269 getVectorElementSize() is slow

Dinar Temirbulatov via llvm-commits llvm-commits at lists.llvm.org
Thu May 21 08:27:26 PDT 2020


Author: Dinar Temirbulatov
Date: 2020-05-21T17:26:50+02:00
New Revision: df3b95bc0ae0f9de0a54db897ed37ffb6ba44226

URL: https://github.com/llvm/llvm-project/commit/df3b95bc0ae0f9de0a54db897ed37ffb6ba44226
DIFF: https://github.com/llvm/llvm-project/commit/df3b95bc0ae0f9de0a54db897ed37ffb6ba44226.diff

LOG: [SLP][NFC] PR45269 getVectorElementSize() is slow

The algorithm inside getVectorElementSize() is almost O(x^2) complexity and
when, for example, we compile MultiSource/Applications/ClamAV/shared_sha256.c
with 1k instructions inside sha256_transform() function that resulted in almost
~800k iterations. The following change improves the algorithm with the map to
a liner complexity.

Differential Revision: https://reviews.llvm.org/D80241

Added: 
    

Modified: 
    llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index d800e0b11d55..26d9705fade5 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -629,7 +629,7 @@ class BoUpSLP {
   /// the stored value. Otherwise, the size is the width of the largest loaded
   /// value reaching V. This method is used by the vectorizer to calculate
   /// vectorization factors.
-  unsigned getVectorElementSize(Value *V) const;
+  unsigned getVectorElementSize(Value *V);
 
   /// Compute the minimum type sizes required to represent the entries in a
   /// vectorizable tree.
@@ -1715,6 +1715,9 @@ class BoUpSLP {
   /// Maps a specific scalar to its tree entry.
   SmallDenseMap<Value*, TreeEntry *> ScalarToTreeEntry;
 
+  /// Maps a value to the proposed vectorizable size.
+  SmallDenseMap<Value *, unsigned> InstrElementSize;
+
   /// A list of scalars that we found that we need to keep as scalars.
   ValueSet MustGather;
 
@@ -4785,6 +4788,7 @@ BoUpSLP::vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues) {
   }
 
   Builder.ClearInsertionPoint();
+  InstrElementSize.clear();
 
   return VectorizableTree[0]->VectorizedValue;
 }
@@ -5321,12 +5325,16 @@ void BoUpSLP::scheduleBlock(BlockScheduling *BS) {
   BS->ScheduleStart = nullptr;
 }
 
-unsigned BoUpSLP::getVectorElementSize(Value *V) const {
+unsigned BoUpSLP::getVectorElementSize(Value *V) {
   // If V is a store, just return the width of the stored value without
   // traversing the expression tree. This is the common case.
   if (auto *Store = dyn_cast<StoreInst>(V))
     return DL->getTypeSizeInBits(Store->getValueOperand()->getType());
 
+  auto E = InstrElementSize.find(V);
+  if (E != InstrElementSize.end())
+    return E->second;
+
   // If V is not a store, we can traverse the expression tree to find loads
   // that feed it. The type of the loaded value may indicate a more suitable
   // width than V's type. We want to base the vector element size on the width
@@ -5372,13 +5380,17 @@ unsigned BoUpSLP::getVectorElementSize(Value *V) const {
       FoundUnknownInst = true;
   }
 
+  int Width = MaxWidth;
   // If we didn't encounter a memory access in the expression tree, or if we
-  // gave up for some reason, just return the width of V.
+  // gave up for some reason, just return the width of V. Otherwise, return the
+  // maximum width we found.
   if (!MaxWidth || FoundUnknownInst)
-    return DL->getTypeSizeInBits(V->getType());
+    Width = DL->getTypeSizeInBits(V->getType());
+
+  for (Instruction *I : Visited)
+    InstrElementSize[I] = Width;
 
-  // Otherwise, return the maximum width we found.
-  return MaxWidth;
+  return Width;
 }
 
 // Determine if a value V in a vectorizable expression Expr can be demoted to a


        


More information about the llvm-commits mailing list