[PATCH] D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors.

Wed Jan 15 17:20:13 PST 2020

vdmitrie added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2761
+        if (Diff && ((NumberOfInstructions < VL.size() &&
+                      Diff->getAPInt().ule((VL.size() - 1) * Size)) ||
+                     (NumberOfInstructions == VL.size() &&
----------------
ABataev wrote:
> vdmitrie wrote:
> > This check is not quite complete.
> > If we for example have following scalars set (VL)
> > 0:  load  i32 from p[0]
> > 1:  load i32 from p[2]
> > 3: undef i32
> > 4: undef i32
> > (note that p[1] is not loaded)
> > 
> > Pointers difference is 8, number of instructions is 2 and VL size is 4:
> > thus 8 <= (4 -1)*4 is true but pointers actually not loaded consecutively (although It is vectorizeable via masked load+shuffle but support seems not implemented yet). Similar issue exists for store.
> > 
> Hmm, see lines 4574-4600 (masked load + shuffle) and 4643-4678 (shuffle + masked store)
Note that two is a power of two. Thus at 4569 it takes path that creates plain load and ends up with loading p[0] + p[1]. 
And even if we would go masked load+shuffle path that not correct either. Mask and shuffle there being built based on undefs rather than pointer analysis of scalar loads. In order to end up with loading p[0] and p2[] VL should look like:
0: load p[0]
1: undef
2: load p[2]
3: undef

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57059/new/

https://reviews.llvm.org/D57059