[PATCH] D14829: [SLP] Vectorize gather-like idioms ending at non-consecutive loads.

Adam Nemet via llvm-commits llvm-commits at lists.llvm.org
Fri Dec 11 10:58:54 PST 2015


anemet added a comment.

> @mssimpso, I don't think that this is the right approach. I think that this should be handled in SelectionDAG, and not in the SLP vectorizer. Your measurements show that there are no performance gains on Spec2000 (and the performance test suite?). Every new heuristic/scan that we add to the SLP vectorizer increases the compile times, and I don't think that adding code to scan loads and start a vec-tree at the load roots is a good heuristic. Many people are using LLVM as a JIT compiler and these people care deeply about compile times. I don't think that in the general case there are many opportunities for vectorization when you search from the address of loads, and I think that your measurements show that this is indeed the case. If you care about the specific pattern of load/store then you can optimize this in SelectionDAG.


@nadav, how would the SelectionDAG algorithm be different from what we do here (starting to combine the from load addresses bottom-up)?  There is also loop-awareness and being able to vectorize across multiple basic blocks that could be beneficial doing this in LLVM IR.

Also in order to optimize h264ref in SPEC2006int (there is gain in SPEC!) additional type shrinking is needed which is probably easier to do at this level.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:415
@@ +414,3 @@
+  /// \return The size in bits of the widest memory access in the expression
+  /// tree ending at \p V. This method is used to calculate vector factors.
+  unsigned getMemAccessWidth(Value *V);
----------------
I think that we typically use the term "vectorization factor"

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3149-3152
@@ +3148,6 @@
+
+  // If V is a store, just return the width of the stored value without
+  // traversing the expression tree. This is the common case.
+  if (auto *Store = dyn_cast<StoreInst>(V))
+    return DL.getTypeSizeInBits(Store->getValueOperand()->getType());
+
----------------
Hmm, that does not really match the comment for the function.   Just because the tree ends in a store we could have a wider load feeding it.  Something is not precise enough here.


http://reviews.llvm.org/D14829





More information about the llvm-commits mailing list