[PATCH] D43776: [SLP] Fix PR36481: vectorize reassociated instructions.
Ayal Zaks via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Mar 9 14:44:09 PST 2018
Ayal added inline comments.
================
Comment at: include/llvm/Analysis/LoopAccessAnalysis.h:679
+/// saves the mask for actual memory accesses in program order in
+/// \p SortedIndices as <2,0,1,3>
+bool sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
----------------
ABataev wrote:
> Ayal wrote:
> > The usage later and documentation above look for a set of references that are consecutive under some permutation. I.e., w/o gaps. The implementation below allows arbitrary gaps, and does not check for zero gaps, i.e., replicated references.
> >
> > Would it be better to simply do a Pigeonhole sort, and perhaps call it isPermutedConsecutiveAccess():
> > 1. Scan all references to find the one with minimal address. Bail out if any reference is incomparable with the running min.
> > 2. Scan all references and set SortedIndices[i] = difference(VL[i], Minimum). Bail out if this entry is to be set more than once.
> >
> > Note: it may be good to support replicated references, say when the gaps are internal to the vector to avoid loading out of bounds. Perhaps add a TODO.
> >
> > Note: it may be good to have LAA provide some common support for both SLP and LV's InterleaveGroup construction, which share some aspects. Perhaps add a TODO.
> The documentation above does not say anything about consecutive access. It just states, that the pointers are sorted and that's it. I did it on purpose, so later we could reuse this function for masked loads|stores.
> Masked loads are not supported yet, that's why in the SLPVectorizer I added an additional check for the consecutive access.
By "The documentation above" I meant the one example of a[i+2], a[i+0], a[i+1] and a[i+3] which is a permutation of consecutive addresses.
Masked loads and stores (will) also require that all pointers be within range of a single vector, right?
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1113
+ const auto *Diff =
+ dyn_cast<SCEVConstant>(SE.getMinusSCEV(SE.getSCEV(Ptr), Scev0));
+ // The pointers may not have a constant offset from each other, or SCEV
----------------
ABataev wrote:
> Ayal wrote:
> > Use computeConstantDifference() instead of computing it explicitly? It should compare GetUnderlyingObject()'s, if worthwhile, rather than doing so here.
> Tried, it does not work in many cases.
Then that should be fixed, worth opening a PR?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1229
+ SmallVector<OrdersType, 4> OpOrders;
+ unsigned DirectOrderNum = 0;
----------------
ABataev wrote:
> Ayal wrote:
> > Document what DirectOrderNum counts and/or have a more self-explanatory name similar to the original one, e.g., `NumOpsWantToKeepOriginalOrder`
> >
> > Can add that `NumOpsWantToKeepOriginalOrder` holds the count for the identity permutation, instead of holding this count inside `NumOpsWantsToKeepOrder` along with all other permutations.
> I don't want to add the new entry for operations, that do not require reordering. I'd better the code in another way.
Understood; suggested before to simply **document** that one could alternatively hold the count for the identity permutation inside `NumOpsWantsToKeepOrder` map, but we're instead keeping it outside using `NumOpsWantToKeepOriginalOrder` counter.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1581
case Instruction::ExtractElement: {
- bool Reuse = canReuseExtract(VL, VL0);
+ OrdersType BestOrder;
+ bool Reuse = canReuseExtract(VL, VL0, BestOrder);
----------------
ABataev wrote:
> Ayal wrote:
> > BestOrder >> CurrentOrder?
> I think it does not matter because the current order is the best order.
The current order may not be the best order, the first time the tree is built, right?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2029
+ }
+ Optional<unsigned> Idx = matchExtractIndex(Inst, I, Inst->getOpcode());
+ if (Idx.hasValue()) {
----------------
ABataev wrote:
> Ayal wrote:
> > Can simplify by checking if BestOrder is the identify permutation at the end, as done at the end of sortPtrAccesses(); using `getExtractIndex(Inst)` which returns Idx even if it's equal to I. Better rename BestOrder here too.
> Check the boolean flag is faster than to perform N comparisons. That's why I'd prefer to leave it as is.
Right; it could simplify the code though, and N is usually very small.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1621
+ newTreeEntry(VL, /*Vectorized=*/true, UserTreeIdx, ReuseShuffleIndicies,
+ I->getFirst());
+ return;
----------------
Very good, thanks.
Can `CurrentOrder` be used instead of `I->getFirst()`?
Can `++NumOpsWantToKeepOrder[CurrentOrder]` work? After all the trouble...
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3027
+ const unsigned E = Indices.size();
+ Mask.resize(Indices.size());
+ for (unsigned I = 0; I < E; ++I)
----------------
May as well use `E` when resizing `Mask`.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3358
+ for (unsigned I = 0, End = E->Scalars.size(); I < End; ++I)
+ Mask[E->ReorderIndices[I]] = I;
+ V = Builder.CreateShuffleVector(V, UndefValue::get(V->getType()),
----------------
Another inversePermutation?
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:5693
+ SmallVector<Value *, 4> ReorderedOps;
+ ReorderedOps.reserve(VL.size());
+ for (const unsigned Idx : *Order)
----------------
Fold by feeding the constructor with the size.
Repository:
rL LLVM
https://reviews.llvm.org/D43776
More information about the llvm-commits
mailing list