[PATCH] D43776: [SLP] Fix PR36481: vectorize reassociated instructions.

Alexey Bataev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 12 09:35:26 PDT 2018


ABataev added inline comments.


================
Comment at: include/llvm/Analysis/LoopAccessAnalysis.h:679
+/// saves the mask for actual memory accesses in program order in
+/// \p SortedIndices as <2,0,1,3>
+bool sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
----------------
Ayal wrote:
> ABataev wrote:
> > Ayal wrote:
> > > The usage later and documentation above look for a set of references that are consecutive under some permutation. I.e., w/o gaps. The implementation below allows arbitrary gaps, and does not check for zero gaps, i.e., replicated references.
> > > 
> > > Would it be better to simply do a Pigeonhole sort, and perhaps call it isPermutedConsecutiveAccess():
> > > 1. Scan all references to find the one with minimal address. Bail out if any reference is incomparable with the running min.
> > > 2. Scan all references and set SortedIndices[i] = difference(VL[i], Minimum). Bail out if this entry is to be set more than once.
> > > 
> > > Note: it may be good to support replicated references, say when the gaps are internal to the vector to avoid loading out of bounds. Perhaps add a TODO.
> > > 
> > > Note: it may be good to have LAA provide some common support for both SLP and LV's InterleaveGroup construction, which share some aspects. Perhaps add a TODO.
> > The documentation above does not say anything about consecutive access. It just states, that the pointers are sorted and that's it. I did it on purpose, so later we could reuse this function for masked loads|stores.
> > Masked loads are not supported yet, that's why in the SLPVectorizer I added an additional check for the consecutive access.
> By "The documentation above" I meant the one example of a[i+2], a[i+0], a[i+1] and a[i+3] which is a permutation of consecutive addresses.
> 
> Masked loads and stores (will) also require that all pointers be within range of a single vector, right?
1. I see.
2. Right


================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1113
+    const auto *Diff =
+        dyn_cast<SCEVConstant>(SE.getMinusSCEV(SE.getSCEV(Ptr), Scev0));
+    // The pointers may not have a constant offset from each other, or SCEV
----------------
Ayal wrote:
> ABataev wrote:
> > Ayal wrote:
> > > Use computeConstantDifference() instead of computing it explicitly? It should compare GetUnderlyingObject()'s, if worthwhile, rather than doing so here.
> > Tried, it does not work in many cases.
> Then that should be fixed, worth opening a PR?
I'm not sure that this is a bug, it is a feature :) . computeConstantDifference() is used to get the difference between the pointer returned by the `GetUnderlyingObject()` and  kind of a `GEP %ptr, 0, n`, where `n` is constant. But this function does not work if the first load element is from `GEP %ptr, 0, m`, not from `%ptr`, and others are from `GEP %ptr, 0, m+1`, `GEP %ptr, 0, m+2` etc.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1229
+  SmallVector<OrdersType, 4> OpOrders;
+  unsigned DirectOrderNum = 0;
 
----------------
Ayal wrote:
> ABataev wrote:
> > Ayal wrote:
> > > Document what DirectOrderNum counts and/or have a more self-explanatory name similar to the original one, e.g., `NumOpsWantToKeepOriginalOrder`
> > > 
> > > Can add that `NumOpsWantToKeepOriginalOrder` holds the count for the identity permutation, instead of holding this count inside `NumOpsWantsToKeepOrder` along with all other permutations.
> > I don't want to add the new entry for operations, that do not require reordering. I'd better the code in another way.
> Understood; suggested before to simply **document** that one could alternatively hold the count for the identity permutation inside `NumOpsWantsToKeepOrder` map, but we're instead keeping it outside using `NumOpsWantToKeepOriginalOrder` counter.
Added a comment to the declaration of `NumOpsWantToKeepOrder`


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1581
     case Instruction::ExtractElement: {
-      bool Reuse = canReuseExtract(VL, VL0);
+      OrdersType BestOrder;
+      bool Reuse = canReuseExtract(VL, VL0, BestOrder);
----------------
Ayal wrote:
> ABataev wrote:
> > Ayal wrote:
> > > BestOrder >> CurrentOrder?
> > I think it does not matter because the current order is the best order.
> The current order may not be the best order, the first time the tree is built, right?
Yes, but this is the best order for this particular bundle. Anyway, I renamed it.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1621
+        newTreeEntry(VL, /*Vectorized=*/true, UserTreeIdx, ReuseShuffleIndicies,
+                     I->getFirst());
+        return;
----------------
Ayal wrote:
> Very good, thanks.
> Can `CurrentOrder` be used instead of `I->getFirst()`? 
> Can `++NumOpsWantToKeepOrder[CurrentOrder]` work? After all the trouble...
No, we cannot use `CurrentOrder`. I tried to reduce the memory usage and instead of copying the `CurrentOrder` in the `newTreEntry()` function I keep the ArrayRef for this order. But `CurrentOrder` is the local variable and it will be destroyed when we exit out of its declaration scope. And, thus, we will keep the reference to incorrect memory. Instead, I need to use the reference that will exist until the end of the vectorization process (as the key of the map)


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3027
+  const unsigned E = Indices.size();
+  Mask.resize(Indices.size());
+  for (unsigned I = 0; I < E; ++I)
----------------
Ayal wrote:
> May as well use `E` when resizing `Mask`.
Oh, yes, missed it, thanks.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3358
+        for (unsigned I = 0, End = E->Scalars.size(); I < End; ++I)
+          Mask[E->ReorderIndices[I]] = I;
+        V = Builder.CreateShuffleVector(V, UndefValue::get(V->getType()),
----------------
Ayal wrote:
> Another inversePermutation?
Yup, thanks.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:5693
+        SmallVector<Value *, 4> ReorderedOps;
+        ReorderedOps.reserve(VL.size());
+        for (const unsigned Idx : *Order)
----------------
Ayal wrote:
> Fold by feeding the constructor with the size.
Reworked it, thanks.


Repository:
  rL LLVM

https://reviews.llvm.org/D43776





More information about the llvm-commits mailing list