[PATCH] D36130: [SLP] Vectorize jumbled memory loads.

Wed Feb 28 09:49:41 PST 2018

ABataev added inline comments.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1125
+  Value *Obj0 = GetUnderlyingObject(Ptr0, DL);
+  PointerType *PtrTy = cast<PointerType>(Ptr0->getType());
+  uint64_t Size = DL.getTypeAllocSize(PtrTy->getElementType());
----------------
`PointerType *`->`auto *`

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1129-1131
+    // The only kind of access we care about here is load.
+    if (!isa<LoadInst>(Val))
+      return false;
----------------
I think there must be an assertion instead of this check.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1141
+
+    const SCEVConstant *Diff =
+        dyn_cast<SCEVConstant>(SE.getMinusSCEV(SE.getSCEV(Ptr), Scev0));
----------------
`const SCEVConstant *`->`const auto *`

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1146-1148
+    if (!Diff || static_cast<unsigned>(Diff->getAPInt().abs().getSExtValue()) >
+                     (VL.size() - 1) * Size)
+      return false;
----------------
This check better to move to SLPVectorizer.cpp, because the function can be used for masked load/store.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1161
+
+  for (unsigned i = 0; i < VL.size(); i++)
+    Sorted.emplace_back(OffValPairs[UseOrder[i]].second);
----------------
`for (unsigned I = 0, E = VL.size(); I < E; ++I)`

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1166
+  if (Mask) {
+    Mask->reserve(VL.size());
+    for (unsigned i = 0; i < VL.size(); i++)
----------------
Actually `Mask` is a full copy of `UseOrder`, you don't need all that complex stuff here

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1112
+// accesses are entered into the map.
+bool llvm::sortLoadAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
+                            ScalarEvolution &SE,
----------------
ABataev wrote:
> ashahid wrote:
> > ABataev wrote:
> > > This function can be used for stores also, it is better to make it universal for stores/loads.
> > I plan to do such improvement in separate patches.
> I just suggest to make universal at the very beginning, that's it
What about this comment? Do you really need Sorted argument?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1644

+      if (ReuseShuffleIndicies.empty()) {
+        bool ShuffledLoads = true;
----------------
ashahid wrote:
> ABataev wrote:
> > Why you can do this only if `ReuseShuffleIndicies.empty()`?
> This is to avoid the overlapping the UniqueValues reuse logic of your changes.
Why you can't handle it? What's the problem?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1649-1654
+          for (unsigned I = 0, E = Sorted.size() - 1; I < E; ++I) {
+            if (!isConsecutiveAccess(Sorted[I], Sorted[I + 1], *DL, *SE)) {
+              ShuffledLoads = false;
+              break;
+            }
+          }
----------------
ashahid wrote:
> ABataev wrote:
> > It is enough just to compare `VL` and `Sorted`. If they are the same, the loads are not shuffled
> Sure it is, but this avoids the compare. So I thought having a boolean is preferable.
Why do we need the compare?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2899
     if (TreeEntry *E = getTreeEntry(S.OpValue)) {
+      if (!E->VectorizedValue && !E->JumbleShuffleIndices.empty())
+        return vectorizeTree(E);
----------------
ashahid wrote:
> ABataev wrote:
> > Why do you need this condition?
> In the 2nd test of jumbled-load.ll the two operands of MUL is fed from the same loaded vector. The 1st operand is SHUFFLE of LOAD and the 2nd operand is the gather of the same scalar loads.
> Query to getTreeEntry() will always return the node with the same vectorized value and hence both the operand of MUL will be fed the shuffled load.
> This check is to avoid this scenario.
This scenario should happen in your patch, the instruction either vectorized, or gathered, but not both.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3528-3533
+    Value *Vec = [&]() {
+      if (auto *SVI = dyn_cast<ShuffleVectorInst>(E->VectorizedValue))
+        if (!E->JumbleShuffleIndices.empty() && isa<LoadInst>(SVI->getOperand(0)))
+          return SVI->getOperand(0);
+      return E->VectorizedValue;
+    }();
----------------
ashahid wrote:
> ABataev wrote:
> > I rather doubt you need all that stuff. You can use original code
> This is required otherwise *multiuse*.ll test as well as PR32086.ll will fail because the lanes were recorded according to the order of scalar loads.
Again, it just may not happen in this patch

Repository:
  rL LLVM

https://reviews.llvm.org/D36130