[PATCH] D26905: [SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way.

Mon Nov 21 18:12:02 PST 2016

mkuper added a comment.

This basically fixes PR28474, right? Does it work correctly on the test-cases there?

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1022
+                                        ScalarEvolution &SE) {
+  std::multimap<int, Value *> offValPair;
+  for (unsigned i = 0; i < VL.size(); i++) {
----------------
Why is this a multimap?

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1024
+  for (unsigned i = 0; i < VL.size(); i++) {
+    Value *Ptr = getPointerOperand(VL[i]);
+    unsigned AS = getAddressSpaceOperand(VL[i]);
----------------
Could you add some documentation to explain what exactly this does?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:464
+    /// \returns true if the scalars in VL are found in this entry.
+    bool isJumbled(ArrayRef<Value *> VL) const {
+      assert(VL.size() == Scalars.size() && "Invalid size");
----------------
I think you may want a different name - this doesn't actually check whether the scalars are jumbled, it checks whether they're all present. That is, it'll return true even if they're all in-order.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:466
+      assert(VL.size() == Scalars.size() && "Invalid size");
+      for (unsigned i = 0; i < VL.size(); i++) {
+        bool Found = false;
----------------
RKSimon wrote:
> This might be simplified with a for range loop and use of llvm:none_of / any_of?
Would it make sense to pre-sort both arrays, and then check the two sorted arrays are equal? This would make it O(nlogn) instead of O(n^2)
(I'm not sure sort based on what, though - as well as the actual gain, since I guess VL.size() is small in practice.)

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1197
       // Check if the loads are consecutive, reversed, or neither.
       // TODO: What we really want is to sort the loads, but for now, check
       // the two likely directions.
----------------
This TODO gets done.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1217

+      if (VL.size() > 2) {
+        SmallVector<Value *, 8> list;
----------------
Do we still need the ReverseConsecutive case at all? That was introduced in r276477 as a patch for the common case of PR28474, where we find the loads in reverse order. But this should completely supersede it, right?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1217

+      if (VL.size() > 2) {
+        SmallVector<Value *, 8> list;
----------------
mkuper wrote:
> Do we still need the ReverseConsecutive case at all? That was introduced in r276477 as a patch for the common case of PR28474, where we find the loads in reverse order. But this should completely supersede it, right?
Why not for VL.size() == 2?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1219
+        SmallVector<Value *, 8> list;
+        auto newVL = (ArrayRef<Value *>)0;
+        bool shuffledLoads = true;
----------------
This looks rather weird. Can you make it more idiomatic?

================
Comment at: test/Transforms/SLPVectorizer/X86/jumbled-load.ll:10
+
+define i32 @jumbled-load(i32* noalias nocapture %in, i32* noalias nocapture %out) {
+
----------------
Please add a test that has several load packets (e.g. multiplies one load sequence by another load sequence).

================
Comment at: test/Transforms/SLPVectorizer/X86/reduction_loads.ll:8
+; CHECK: mul <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>, [[SHUF]]

 define i32 @test(i32* nocapture readonly %p) {
----------------
RKSimon wrote:
> It'd be better if there was more context to this additional shuffle - regenerate + commit the current output with utils/update_test_checks.py ? For an IR loop it's not that large an output.
Yes, I'd be interested to know if we added a shuffle here, or just moved a shuffle from the store side to the load side (which makes sense). 

https://reviews.llvm.org/D26905