[PATCH] D36130: [SLP] Vectorize jumbled memory loads.

Alexey Bataev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 12 08:06:46 PST 2018


ABataev added inline comments.


================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1112
+// accesses are entered into the map.
+bool llvm::sortLoadAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
+                            ScalarEvolution &SE,
----------------
This function can be used for stores also, it is better to make it universal for stores/loads.


================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1156
+  // Sort the memory accesses and keep the order of their uses in UseOrder.
+  std::sort(UseOrder.begin(), UseOrder.end(),
+            [&OffValPairs](unsigned Left, unsigned Right) {
----------------
It is better to use `stable_sort` rather than `sort`


================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1169
+      Mask->emplace_back(i);
+    std::sort(Mask->begin(), Mask->end(),
+              [&UseOrder](unsigned Left, unsigned Right) {
----------------
`stable_sort`


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:736-742
+    /// Records optional shuffle mask for the uses of jumbled memory accesses.
+    /// For example, a non-empty ShuffleMask[1] represents the permutation of
+    /// lanes that operand #1 of this vectorized instruction should undergo
+    /// before feeding this vectorized instruction, whereas an empty
+    /// ShuffleMask[0] indicates that the lanes of operand #0 of this vectorized
+    /// instruction need not be permuted at all.
+    SmallVector<SmallVector<unsigned, 4>, 2> ShuffleMask;
----------------
Why you can't have just one shuffle here for all external uses?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1661
+
+      if (VL.size() > 2) {
+        bool ShuffledLoads = true;
----------------
Is it possible at all that `VL` has less than 4 elements here?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1666
+        if (sortLoadAccesses(VL, *DL, *SE, Sorted, &Mask)) {
+          for (unsigned i = 0, e = Sorted.size() - 1; i < e; ++i) {
+            if (!isConsecutiveAccess(Sorted[i], Sorted[i + 1], *DL, *SE)) {
----------------
`i`->`I`, `e`->`E`. Variables must have Camel-like names.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1677-1678
+                            "permutation of loaded lanes.\n");
+            newTreeEntry(Sorted, true, UserTreeIdx,
+                         makeArrayRef(Mask.begin(), Mask.end()), OpdNum);
+            return;
----------------
Bad decision. It is better to use original `VL` here, rather than `Sorted` and add an additional array of sorted indieces. In this case you don't need all these additional numbers and all that complex logic to find the correct tree entry for the list of values.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2229-2235
+        for (const SmallVector<unsigned, 4> &Mask :
+             UserTreeEntry->ShuffleMask) {
+          if (!Mask.empty()) {
+            VecLdCost += TTI->getShuffleCost(
+                TargetTransformInfo::SK_PermuteSingleSrc, VecTy, 0);
+          }
+        }
----------------
You don't need so many shuffles, it is enough just to have just one. 


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3324
 
-    Value *Vec = E->VectorizedValue;
+    Value *Vec = [&]() {
+      if (auto *SVI = dyn_cast<ShuffleVectorInst>(E->VectorizedValue))
----------------
I think you can have default capture by value here rather than by reference.


Repository:
  rL LLVM

https://reviews.llvm.org/D36130





More information about the llvm-commits mailing list