[PATCH] D43776: [SLP] Fix PR36481: vectorize reassociated instructions.

Fri Mar 30 12:34:29 PDT 2018

ABataev marked 9 inline comments as done.
ABataev added a comment.

In https://reviews.llvm.org/D43776#1047322, @Ayal wrote:

> Have test(s) for extractvalue's, for completeness.
>  Make sure tests cover best-order selection: cases where original order is just as frequent as other orders (tie-break), less frequent, more frequent.

We already have couple tests for extractvalue:  ARM/sroa.ll and X86/insertvalue.ll. They already have tests with the different order of the extractvalue instructions.

================
Comment at: include/llvm/Analysis/LoopAccessAnalysis.h:679
+/// saves the mask for actual memory accesses in program order in
+/// \p SortedIndices as <2,0,1,3>
+bool sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
----------------
Ayal wrote:
> ABataev wrote:
> > Ayal wrote:
> > > ABataev wrote:
> > > > Ayal wrote:
> > > > > The usage later and documentation above look for a set of references that are consecutive under some permutation. I.e., w/o gaps. The implementation below allows arbitrary gaps, and does not check for zero gaps, i.e., replicated references.
> > > > > 
> > > > > Would it be better to simply do a Pigeonhole sort, and perhaps call it isPermutedConsecutiveAccess():
> > > > > 1. Scan all references to find the one with minimal address. Bail out if any reference is incomparable with the running min.
> > > > > 2. Scan all references and set SortedIndices[i] = difference(VL[i], Minimum). Bail out if this entry is to be set more than once.
> > > > > 
> > > > > Note: it may be good to support replicated references, say when the gaps are internal to the vector to avoid loading out of bounds. Perhaps add a TODO.
> > > > > 
> > > > > Note: it may be good to have LAA provide some common support for both SLP and LV's InterleaveGroup construction, which share some aspects. Perhaps add a TODO.
> > > > The documentation above does not say anything about consecutive access. It just states, that the pointers are sorted and that's it. I did it on purpose, so later we could reuse this function for masked loads|stores.
> > > > Masked loads are not supported yet, that's why in the SLPVectorizer I added an additional check for the consecutive access.
> > > By "The documentation above" I meant the one example of a[i+2], a[i+0], a[i+1] and a[i+3] which is a permutation of consecutive addresses.
> > > 
> > > Masked loads and stores (will) also require that all pointers be within range of a single vector, right?
> > 1. I see.
> > 2. Right
> Check for zero gaps, i.e., replicated references?
> These are currently not supported, when checking `canReuseExtract()`.
Agree, will add checks

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1113
+    const auto *Diff =
+        dyn_cast<SCEVConstant>(SE.getMinusSCEV(SE.getSCEV(Ptr), Scev0));
+    // The pointers may not have a constant offset from each other, or SCEV
----------------
Ayal wrote:
> ABataev wrote:
> > Ayal wrote:
> > > ABataev wrote:
> > > > Ayal wrote:
> > > > > Use computeConstantDifference() instead of computing it explicitly? It should compare GetUnderlyingObject()'s, if worthwhile, rather than doing so here.
> > > > Tried, it does not work in many cases.
> > > Then that should be fixed, worth opening a PR?
> > I'm not sure that this is a bug, it is a feature :) . computeConstantDifference() is used to get the difference between the pointer returned by the `GetUnderlyingObject()` and  kind of a `GEP %ptr, 0, n`, where `n` is constant. But this function does not work if the first load element is from `GEP %ptr, 0, m`, not from `%ptr`, and others are from `GEP %ptr, 0, m+1`, `GEP %ptr, 0, m+2` etc.
> ok, right; `computeConstantDifference()` is lightweight. But it would be good to refactor a more time consuming variant which checks if `getMinusSCEV` returns a constant, and first compares UnderlyingObjects; and also AddressSpaces, following LoopVectorize.
Added checks for addressspace, comparison for underlying objects already were there.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:458
   assert(Opcode == Instruction::ExtractElement ||
          Opcode == Instruction::ExtractValue);
   if (Opcode == Instruction::ExtractElement) {
----------------
Ayal wrote:
> Add a message to the assert.
Ok

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2044
+  // Also, later we can check that all the indices are used and we have a
+  // consecutive access in the extract instructions (if at least one element of
+  // the BestOrder is still E + 1, we don't have consecutive extract
----------------
Ayal wrote:
> > (if at least one element of ... ).
> 
> ", by checking that no element of `CurrentOrder` still has value `E + 1`."
> Note that there is no such check later.
This check is not required, it is checked automatically. We check that number of the extract elements is the same as the vector length at first. If later we try to write the index to element that does not equals `E+1` it means that at least one element will still have `E+1` as the value and we're have at least 2 elements with the same index.

Repository:
  rL LLVM

https://reviews.llvm.org/D43776