[PATCH] D26905: [SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way.

Tue Jan 31 01:35:23 PST 2017

ashahid added inline comments.

================
Comment at: test/Transforms/SLPVectorizer/X86/store-jumbled.ll:14
+; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x i32>, <4 x i32>* [[TMP1]], align 4
+; CHECK-NEXT:    [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 0, i32 2>
 ; CHECK-NEXT:    [[INN_ADDR:%.*]] = getelementptr inbounds i32, i32* [[INN:%.*]], i64 0
----------------
mkuper wrote:
> Ok, so this is pretty much what I thought will happen. We shuffle both loads the same way, and then multiply, instead of multiplying and then shuffling. But this is probably fine - I hope InstCombine will pick up on this and combine it to a mul followed by a shuffle, if the masks match.
Yes you are right. I verified, its happening exactly as you explained

Repository:
  rL LLVM

https://reviews.llvm.org/D26905