[PATCH] D26905: [SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way.

Fri Jan 27 10:53:38 PST 2017

mkuper accepted this revision.
mkuper added a comment.
This revision is now accepted and ready to land.

LGTM, Thanks!

================
Comment at: test/Transforms/SLPVectorizer/X86/store-jumbled.ll:14
+; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i32>, <4 x i32>* [[TMP1]], align 4
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 0, i32 2>
 ; CHECK-NEXT:    [[INN_ADDR:%.*]] = getelementptr inbounds i32, i32* [[INN:%.*]], i64 0
----------------
Ok, so this is pretty much what I thought will happen. We shuffle both loads the same way, and then multiply, instead of multiplying and then shuffling. But this is probably fine - I hope InstCombine will pick up on this and combine it to a mul followed by a shuffle, if the masks match.

https://reviews.llvm.org/D26905