[PATCH] D26905: [SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way.

Wed Dec 21 03:06:04 PST 2016

ashahid added inline comments.

================
Comment at: include/llvm/Analysis/LoopAccessAnalysis.h:695
+/// access.
+SmallVector<Value *, 8> sortMemAccess(ArrayRef<Value *> VL, const DataLayout &DL,
+                                      ScalarEvolution &SE);
----------------
mssimpso wrote:
> Should this be renamed to sortMemAccesses? If so, the comment above should also be updated: "jumbled memory accesses".
> 
> Also, should we be returning a SmallVector here? We could also pass a SmallVectorImpl<Value *> &Sorted to the function and place the sorted values there.
Ok, I will do that.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1068
+    // Compute the constant offset from the base pointer of each memory access
+    // and insert into the multimap to ensure it is sorted.
+    Value *Ptr = getPointerOperand(Val);
----------------
mssimpso wrote:
> Please update comment since you're no longer using a multimap.
Oh, sure.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2565
+      // must be followed by a 'shuffle' with the required jumbled mask.
+      if (E->NeedToShuffle && (VL.size() == E->Scalars.size())) {
+        SmallVector<Constant *, 8> Mask;
----------------
mssimpso wrote:
> ashahid wrote:
> > mssimpso wrote:
> > > I probably missed this, but why are we checking the sizes? Does this mean there will be cases where E->NeedToShuffle is true but we don't generate the shuffle?
> > No, I want to ensure that resulting vector type is not differing due to the length of the vector value.
> I don't think I fully understand this yet. Can you please make the comment more detailed. In particular, when does VL.size() not equal Scalars.size()? Is this the case when a bundle gets split up into smaller chunks? And then if this is true, what does it imply for the jumbled accesses. It looks like we will end up with a vector load still, but then when are they placed in the right order? Sorry if this should all be obvious!
As such I don't expect VL.size() not equal to Scalars.size(), but if it is so, the compiler may throw assertion for incorrect vector types. I just wanted to avoid that. May be I am presuming it. I will check by avoiding this specific check.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:413

-  /// Vectorize a single entry in the tree.
-  Value *vectorizeTree(TreeEntry *E);
+  /// Vectorize a single entry in the tree.VL is the scalars in program order.
+  Value *vectorizeTree(ArrayRef<Value *> VL, TreeEntry *E);
----------------
mssimpso wrote:
> Can we be a bit more explicit about VL. Are VL the scalar roots of the vectorizable tree?
No, it is not scalar roots of vectorizable tree. VL is all isomorphic scalars , for example ADD1, ADD2 and so on or LOAD1 , LOAD2 etc

================
Comment at: test/Transforms/SLPVectorizer/X86/reduction_loads.ll:20
 ; CHECK-NEXT:    [[TMP1:%.*]] = load <8 x i32>, <8 x i32>* [[TMP0]], align 4
-; CHECK-NEXT:    [[TMP2:%.*]] = mul <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>, [[TMP1]]
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT:    [[TMP3:%.*]] = mul <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>, [[TMP2]]
----------------
RKSimon wrote:
> mkuper wrote:
> > RKSimon wrote:
> > > mkuper wrote:
> > > > RKSimon wrote:
> > > > > What can be done to avoid this regression?
> > > > Ohh, right, wanted to ask about this as well.
> > > > My guess is that this wasn't actually a regression, but we moved the shuffle from store side to the load side.  Is that right?
> > > If the update_test_checks script has done its job and generated checks for all the IR then this is an additional shuffle, I can't see an equivalent shuffle or set of extracts in the codegen on the left.
> > Argh, I didn't even look at the new version of the test, my assumption was from looking at the non-generated one (which is even more embarrassing, since I originally wrote that test, and didn't remember it doesn't have a shuffle...)
> > We really should not be regressing this.
> Any luck with working out what is causing this regression? Cross lane shuffles can be quite expensive.
The regression is because here the order of scalar loads are reverse consecutive initially. I will update the patch to resolve it.

https://reviews.llvm.org/D26905