[PATCH] D26905: [SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way.

Thu Dec 15 08:26:43 PST 2016

RKSimon added inline comments.

================
Comment at: test/Transforms/SLPVectorizer/X86/reduction_loads.ll:20
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <8 x i32>, <8 x i32>* [[TMP0]], align 4
-; CHECK-NEXT:    [[TMP2:%.*]] = mul <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>, [[TMP1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT:    [[TMP3:%.*]] = mul <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>, [[TMP2]]
----------------
mkuper wrote:
> RKSimon wrote:
> > mkuper wrote:
> > > RKSimon wrote:
> > > > What can be done to avoid this regression?
> > > Ohh, right, wanted to ask about this as well.
> > > My guess is that this wasn't actually a regression, but we moved the shuffle from store side to the load side.  Is that right?
> > If the update_test_checks script has done its job and generated checks for all the IR then this is an additional shuffle, I can't see an equivalent shuffle or set of extracts in the codegen on the left.
> Argh, I didn't even look at the new version of the test, my assumption was from looking at the non-generated one (which is even more embarrassing, since I originally wrote that test, and didn't remember it doesn't have a shuffle...)
> We really should not be regressing this.
Any luck with working out what is causing this regression? Cross lane shuffles can be quite expensive.

https://reviews.llvm.org/D26905