[PATCH] D36130: [SLP] Vectorize jumbled memory loads.

Wed Feb 14 06:52:56 PST 2018

ABataev added inline comments.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1677-1678
+                            "permutation of loaded lanes.\n");
+            newTreeEntry(Sorted, true, UserTreeIdx,
+                         makeArrayRef(Mask.begin(), Mask.end()), OpdNum);
+            return;
----------------
ashahid wrote:
> ABataev wrote:
> > ashahid wrote:
> > > ABataev wrote:
> > > > Bad decision. It is better to use original `VL` here, rather than `Sorted` and add an additional array of sorted indieces. In this case you don't need all these additional numbers and all that complex logic to find the correct tree entry for the list of values.
> > > In fact earlier design in patch (https://reviews.llvm.org/D26905) was to use original VL, however there was counter argument to that which I don't remember exactly.
> > It is better to use original `VL` here, otherwise it will end with a lot of troubles and will require the whole bunch of changes in the vectorization process to find the perfect match for the vector of vectorized values. I don't think it is a good idea to have a lot of changes accross the whole module to handle jumbled loads.
> In the context where we can have multiple user of loaded vector with different shuffle mask, the design is to represent these different shuffle mask for each user corresponding to the user's operand number. Having single sorted indices will not be sufficient for this.
> Given the objective of handling multiple out of order uses changes are not that big I feel.
Now I see what do you want to do. But I don't think that this the correct way to implement it. It complicates the whole vectorization process. I'd suggest to create different tree entries for each particular order of the loads and exclude loads from the check that the same instruction is used several times in different tree entries.
If you worry about several different loads of the same values, I think they will be optimized by instruction combiner.

Repository:
  rL LLVM

https://reviews.llvm.org/D36130