[PATCH] D36130: [SLP] Vectorize jumbled memory loads.

Fri Feb 16 09:46:31 PST 2018

ashahid added inline comments.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1677-1678
+                            "permutation of loaded lanes.\n");
+            newTreeEntry(Sorted, true, UserTreeIdx,
+                         makeArrayRef(Mask.begin(), Mask.end()), OpdNum);
+            return;
----------------
ABataev wrote:
> ashahid wrote:
> > ABataev wrote:
> > > ashahid wrote:
> > > > ABataev wrote:
> > > > > Bad decision. It is better to use original `VL` here, rather than `Sorted` and add an additional array of sorted indieces. In this case you don't need all these additional numbers and all that complex logic to find the correct tree entry for the list of values.
> > > > In fact earlier design in patch (https://reviews.llvm.org/D26905) was to use original VL, however there was counter argument to that which I don't remember exactly.
> > > It is better to use original `VL` here, otherwise it will end with a lot of troubles and will require the whole bunch of changes in the vectorization process to find the perfect match for the vector of vectorized values. I don't think it is a good idea to have a lot of changes accross the whole module to handle jumbled loads.
> > In the context where we can have multiple user of loaded vector with different shuffle mask, the design is to represent these different shuffle mask for each user corresponding to the user's operand number. Having single sorted indices will not be sufficient for this.
> > Given the objective of handling multiple out of order uses changes are not that big I feel.
> Now I see what do you want to do. But I don't think that this the correct way to implement it. It complicates the whole vectorization process. I'd suggest to create different tree entries for each particular order of the loads and exclude loads from the check that the same instruction is used several times in different tree entries.
> If you worry about several different loads of the same values, I think they will be optimized by instruction combiner.
Off course this could have been a better solution but I was not sure of the impact it may have by breaking the single tree entry assumption. One problem I see is the TreeEntry lookup if multiple node with same scalar values are present.
I can use isSame() check to make sure correct tree entry is found, however it may become costly in case of PHI instruction fed by same vector Load.

Repository:
  rL LLVM

https://reviews.llvm.org/D36130