[PATCH] D36130: [SLP] Vectorize jumbled memory loads.

Tue Mar 6 21:09:42 PST 2018

ashahid added inline comments.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1660
+                            "permutation of loaded lanes.\n");
+            newTreeEntry(Sorted, /*Vectorized*/ true, UserTreeIdx,
+                         ReuseShuffleIndicies, Mask);
----------------
ABataev wrote:
> ashahid wrote:
> > ABataev wrote:
> > > ashahid wrote:
> > > > ABataev wrote:
> > > > > No, use original `VL` here, do not use `Sorted`. In this case you won't need an additional argument in `sortLoadAccesses` and you don't need all that complex stuff with the lambda on line 3528
> > > > If I am not wrong, for LOADs, VL0 must be the 1st element of the buffer whose base address will be used for vector load. 
> > > > So using VL will break this assumption.
> > > Why? And why you can't choose the right VL0 during vectorization?
> > For example, if we have two arrays A[4] and B[1] laying one after another in memory and the selected VF is 4 for the scalar loads of A[1], A[2], A[0], A[3] in order of use, the generated vector load will load the elements A[1], A[2], A[3], B[1] which is not desired.
> > 
> > Of-course we can choose the right VL0 during vectorization but we have to compute it again here using the mask which can be avoided if we use Sorted VL.
> > 
> > If I am missing something? 
> You already store the mask in the tree entry and you can choose the right VL0 using this mask. Using Sorted VL complicates the whole vectorization process and, thus, adds some extra points for the incorrect vectorization. That's why I insist to use original VL and choose the correct VL0 during codegen.
Got it. Since you already have these improvements in this patch https://reviews.llvm.org/D43776 , I think it is better to get that through.

Repository:
  rL LLVM

https://reviews.llvm.org/D36130