[PATCH] D36130: [SLP] Vectorize jumbled memory loads.

Tue Feb 13 07:34:05 PST 2018

ABataev added a comment.

In https://reviews.llvm.org/D36130#1006202, @ashahid wrote:

> Hi Alexey,
>
> Thanks for looking into it.I will update it accordingly. 
>  BTW this patch is failing with its tests after the re-base on top of your patch. Do you foresee any conflicting code?

Probably, it is hard to exactly without looking at the result.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1112
+// accesses are entered into the map.
+bool llvm::sortLoadAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
+                            ScalarEvolution &SE,
----------------
ashahid wrote:
> ABataev wrote:
> > This function can be used for stores also, it is better to make it universal for stores/loads.
> I plan to do such improvement in separate patches.
I just suggest to make universal at the very beginning, that's it

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:736-742
+    /// Records optional shuffle mask for the uses of jumbled memory accesses.
+    /// For example, a non-empty ShuffleMask[1] represents the permutation of
+    /// lanes that operand #1 of this vectorized instruction should undergo
+    /// before feeding this vectorized instruction, whereas an empty
+    /// ShuffleMask[0] indicates that the lanes of operand #0 of this vectorized
+    /// instruction need not be permuted at all.
+    SmallVector<SmallVector<unsigned, 4>, 2> ShuffleMask;
----------------
ashahid wrote:
> ABataev wrote:
> > Why you can't have just one shuffle here for all external uses?
> This is for in-tree multi uses of a single vector load where the uses has different masks/permutation.
> This section of comment https://reviews.llvm.org/D36130#inline-326711
> discussed it earlier. Also there is figure attached.
I still don't understand what's the problem here.
1. You need to perform the loads in some order. 
2. You sort the loads to be in the sequntially direct order and perform the vector load starting from the lowest address.
3. You reshuffle the loaded vector value to the original order.
That's it, you have your loads in the required order. Just one shuffle is required. Why do you need some more? Also, I don't understand why do you need so many changes, why do you need additional indicies etc.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1661
+
+      if (VL.size() > 2) {
+        bool ShuffledLoads = true;
----------------
ashahid wrote:
> ABataev wrote:
> > Is it possible at all that `VL` has less than 4 elements here?
> I think yes, for example a couple of i64 loads considering minimum register width as 128-bit.
> 
> However, this check here was basically meant to indicate jumbled loads of size 2 is essentially a reversed load.
It is going to be handled by the reverse loads patch

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1677-1678
+                            "permutation of loaded lanes.\n");
+            newTreeEntry(Sorted, true, UserTreeIdx,
+                         makeArrayRef(Mask.begin(), Mask.end()), OpdNum);
+            return;
----------------
ashahid wrote:
> ABataev wrote:
> > Bad decision. It is better to use original `VL` here, rather than `Sorted` and add an additional array of sorted indieces. In this case you don't need all these additional numbers and all that complex logic to find the correct tree entry for the list of values.
> In fact earlier design in patch (https://reviews.llvm.org/D26905) was to use original VL, however there was counter argument to that which I don't remember exactly.
It is better to use original `VL` here, otherwise it will end with a lot of troubles and will require the whole bunch of changes in the vectorization process to find the perfect match for the vector of vectorized values. I don't think it is a good idea to have a lot of changes accross the whole module to handle jumbled loads.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3063
+        isJumbled = true;
+        LI = cast<LoadInst>(E->Scalars[0]);
+      } else {
----------------
Is this correct? `E->Scalars[0]` is exactly `VL0`

Repository:
  rL LLVM

https://reviews.llvm.org/D36130