[PATCH] D26905: [SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way.

Wed Nov 30 13:03:00 PST 2016

mkuper added a comment.

Sorry for the delay, I was on vacation.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1022
+                                        ScalarEvolution &SE) {
+  std::multimap<int, Value *> offValPair;
+  for (unsigned i = 0; i < VL.size(); i++) {
----------------
ashahid wrote:
> mkuper wrote:
> > Why is this a multimap?
> This is because the elements in the multimap follow a certain order, so using this will ensure that the values are sorted accordingly.
It doesn't seem right to use a multimap just for the sorting behavior.
I think you can find a more appropriate container. See http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1019
+/// Returns the sorted memory accesses after sorting it.
+SmallVector<Value *, 8> llvm::SortMemAccess(ArrayRef<Value *> VL,
+                                            const DataLayout &DL,
----------------
The LLVM coding standard is that function names start with a non-capital, and variable names start with a capital.
(There are some exceptions for functions, but this is mostly in old code.)

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1022
+                                            ScalarEvolution &SE) {
+  std::multimap<int, Value *> offValPair;
+  for (auto *Val : VL) {
----------------
Also, capitalization.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1033
+    Ptr = Ptr->stripAndAccumulateInBoundsConstantOffsets(DL, Offset);
+    const SCEV *OffsetSCEV = SE.getConstant(Offset);
+    auto *OffsetC = dyn_cast<SCEVConstant>(OffsetSCEV);
----------------
Why are you turning a constant into a SCEV and back into a constant?

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1034
+    const SCEV *OffsetSCEV = SE.getConstant(Offset);
+    auto *OffsetC = dyn_cast<SCEVConstant>(OffsetSCEV);
+    const APInt &constOffset = OffsetC->getAPInt();
----------------
If you know this is a SCEVConstant, this should be a cast<>. Otherwise, you need to check the dyn_cast<> actually succeeded.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:414
   /// Vectorize a single entry in the tree.
-  Value *vectorizeTree(TreeEntry *E);
+  Value *vectorizeTree(ArrayRef<Value *> VL, TreeEntry *E);

----------------
Please add an explanation for what the VL parameters means here.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:466
+      assert(VL.size() == Scalars.size() && "Invalid size");
+      for (unsigned i = 0; i < VL.size(); i++) {
+        bool Found = false;
----------------
ashahid wrote:
> mkuper wrote:
> > RKSimon wrote:
> > > This might be simplified with a for range loop and use of llvm:none_of / any_of?
> > Would it make sense to pre-sort both arrays, and then check the two sorted arrays are equal? This would make it O(nlogn) instead of O(n^2)
> > (I'm not sure sort based on what, though - as well as the actual gain, since I guess VL.size() is small in practice.)
> Pre-sorting would require two calls for sort and then compare, IMO, for the given small VL.size it would not make much difference. However I am open to other views.
To be honest, I'm not sure - so I'd appreciate another opinion about pre-sorting.
Matt/Hal/Simon?

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1217

+      if (VL.size() > 2) {
+        SmallVector<Value *, 8> list;
----------------
ashahid wrote:
> mkuper wrote:
> > mkuper wrote:
> > > Do we still need the ReverseConsecutive case at all? That was introduced in r276477 as a patch for the common case of PR28474, where we find the loads in reverse order. But this should completely supersede it, right?
> > Why not for VL.size() == 2?
> A jumbled VL of VL.size() == 2 is essentially a case of reversed VL. Considering the tradeoff between compile time of extra buildTree() for VL.size==2  vs additional runtime for shufflevector, I opted for extra compile time over extra runtime.
The trade off here is more one of code complexity - is the gain in compile time worth having all the additional logic present for both the "fully unsorted" case and the "reversed" case.

https://reviews.llvm.org/D26905