[PATCH] D43776: [SLP] Fix PR36481: vectorize reassociated instructions.

Ayal Zaks via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 7 23:39:21 PST 2018


Ayal added a comment.

This patch addresses the following TODO, plus handles extracts:

  // Check if the loads are consecutive, reversed, or neither.
  // TODO: What we really want is to sort the loads, but for now, check
  // the two likely directions.

At some point it's worth documenting that the best order is set once, at the root of the tree, and then gets propagated to its leaves. Would be good to do so w/o having to rebuild the tree (introduced before this patch)?



================
Comment at: include/llvm/Analysis/LoopAccessAnalysis.h:679
+/// saves the mask for actual memory accesses in program order in
+/// \p SortedIndices as <2,0,1,3>
+bool sortPtrAccesses(ArrayRef<Value *> VL, const DataLayout &DL,
----------------
The usage later and documentation above look for a set of references that are consecutive under some permutation. I.e., w/o gaps. The implementation below allows arbitrary gaps, and does not check for zero gaps, i.e., replicated references.

Would it be better to simply do a Pigeonhole sort, and perhaps call it isPermutedConsecutiveAccess():
1. Scan all references to find the one with minimal address. Bail out if any reference is incomparable with the running min.
2. Scan all references and set SortedIndices[i] = difference(VL[i], Minimum). Bail out if this entry is to be set more than once.

Note: it may be good to support replicated references, say when the gaps are internal to the vector to avoid loading out of bounds. Perhaps add a TODO.

Note: it may be good to have LAA provide some common support for both SLP and LV's InterleaveGroup construction, which share some aspects. Perhaps add a TODO.


================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1113
+    const auto *Diff =
+        dyn_cast<SCEVConstant>(SE.getMinusSCEV(SE.getSCEV(Ptr), Scev0));
+    // The pointers may not have a constant offset from each other, or SCEV
----------------
Use computeConstantDifference() instead of computing it explicitly? It should compare GetUnderlyingObject()'s, if worthwhile, rather than doing so here.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:454
 
 /// \returns True if Extract{Value,Element} instruction extracts element Idx.
+static Optional<unsigned> matchExtractIndex(Instruction *E, unsigned Idx,
----------------
Update above documentation accordingly.

Instead of returning the index when it's not Idx, may as well have `getExtractIndex()` return it always, and have the caller compare it to Idx?
While we're at it, may as well pass only `E` and have the callee get its Opcode.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:608
+  /// \returns The best order of instructions for vectorization.
+  ArrayRef<unsigned> bestOrder() const {
+    auto I = std::max_element(
----------------
The order which bestOrder() provides is then used to form a vector of instructions. Suggest to have this method supply the desired vector, given the instructions to permute.



================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:667
+  /// vector) and returns the mask for reordering operations, if it allows to
+  /// reuse extract instructions.
+  bool canReuseExtract(ArrayRef<Value *> VL, Value *OpValue,
----------------
```
and returns the mask for reordering operations, if it allows
```
should specify more accurately something like:
...and sets \p BestOrder to the identity permutation; otherwise returns False, setting \p BestOrder to either an empty vector or a non-identity permutation that allows...


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1224
   /// of operation bundles that contain consecutive operations in reversed
   /// order.
+  DenseMap<ArrayRef<unsigned>, unsigned> NumOpsWantToKeepOrder;
----------------
Update above documentation.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1228
+  /// All orders of operations found during the vectorization.
+  SmallVector<OrdersType, 4> OpOrders;
+  unsigned DirectOrderNum = 0;
----------------
Can the permutations be kept inside NumOpsWantsToKeepOrder, using OrdersType as its key, instead of holding them in OpOrders? So that one could later simply do 
```
++NumOpsWantsToKeepOrder[CurrentOrder];
```
See, e.g., `UniquifierDenseMapInfo` in LSR.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1229
+  SmallVector<OrdersType, 4> OpOrders;
+  unsigned DirectOrderNum = 0;
 
----------------
Document what DirectOrderNum counts and/or have a more self-explanatory name similar to the original one, e.g., `NumOpsWantToKeepOriginalOrder`

Can add that `NumOpsWantToKeepOriginalOrder` holds the count for the identity permutation, instead of holding this count inside `NumOpsWantsToKeepOrder` along with all other permutations.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1581
     case Instruction::ExtractElement: {
-      bool Reuse = canReuseExtract(VL, VL0);
+      OrdersType BestOrder;
+      bool Reuse = canReuseExtract(VL, VL0, BestOrder);
----------------
BestOrder >> CurrentOrder?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1587
+        newTreeEntry(VL, /*Vectorized=*/true, UserTreeIdx,
+                     ReuseShuffleIndicies);
       } else {
----------------
Better early exit by returning here.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1593
+          auto I = NumOpsWantToKeepOrder.find(BestOrder);
+          if (I == NumOpsWantToKeepOrder.end()) {
+            OpOrders.emplace_back(BestOrder);
----------------
This is pretty hard to follow, and deserves an explanation. Would be better to simply do something like `++NumOpsWantToKeepOrder[BestOrder]`.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1631
+        LoadInst *L = cast<LoadInst>(V);
+        PointerOps.emplace_back(L->getPointerOperand());
         if (!L->isSimple()) {
----------------
Sink the emplace_back to after the handling of non-simple loads?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1640
 
-      // Check if the loads are consecutive, reversed, or neither.
-      // TODO: What we really want is to sort the loads, but for now, check
-      // the two likely directions.
-      bool Consecutive = true;
-      bool ReverseConsecutive = true;
-      for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {
-        if (!isConsecutiveAccess(VL[i], VL[i + 1], *DL, *SE)) {
-          Consecutive = false;
-          break;
-        } else {
-          ReverseConsecutive = false;
+      OrdersType BestOrder;
+      // Check the order of pointer operands.
----------------
"BestOrder" >> "CurrentOrder", or "VLOrder"?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1644
+        LoadInst *L0;
+        LoadInst *L1;
+        if (BestOrder.empty()) {
----------------
Reuse PointerOps and have Value *P0,1 = PointerOps.front,back() instead of LoadInst *L0,1 just to get their PointerOperand (and SCEV) later?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1659
+        // Check that the sorted loads are consecutive.
+        if (Diff && Diff->getAPInt().getZExtValue() == (VL.size() - 1) * Size) {
+          if (BestOrder.empty()) {
----------------
Better have sortPtrAccess() set "BestOrder" only if the given pointers are indeed consecutive once permuted, instead of checking here the Diff of max - min.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2022
+  bool ShouldKeepOrder = true;
+  BestOrder.assign(VL.size(), VL.size() + 1);
   for (unsigned I = 0, E = VL.size(); I < E; ++I) {
----------------
Comment that BestOrder is initialize to invalid values.
Perhaps set `E = VL.size()` here and assign `E + 1`, to match the later checks for initialized/unset values.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:2029
+    }
+    Optional<unsigned> Idx = matchExtractIndex(Inst, I, Inst->getOpcode());
+    if (Idx.hasValue()) {
----------------
Can simplify by checking if BestOrder is the identify permutation at the end, as done at the end of sortPtrAccesses(); using `getExtractIndex(Inst)` which returns Idx even if it's equal to I. Better rename BestOrder here too. 


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3077
+          for (unsigned I = 0, End = E->Scalars.size(); I < End; ++I)
+            Mask[E->ReorderIndices[I]] = I;
+          Builder.SetInsertPoint(VL0);
----------------
Capture this "inversePermutation" in a method, to be called again below?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3086
+            Builder.SetInsertPoint(VL0);
           Builder.SetInsertPoint(VL0);
           V = Builder.CreateShuffleVector(V, UndefValue::get(VecTy),
----------------
InsertPoint may be set twice to VL0?



================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3298
+      if (IsReorder)
+        VL0 = cast<Instruction>(E->Scalars[E->ReorderIndices.front()]);
       setInsertPointAfterBundle(E->Scalars, VL0);
----------------
Should we first inverse the permutation and then take its front()?

Would be good to have a testcase where this makes a difference and check it (one way or the other), if there isn't one already.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4915
         // there are exactly two operations.
         assert(Ops.size() == 2);
+        SmallVector<Value *, 4> ReorderedOps;
----------------
If only two operations are still allowed, ReorderedOps may as well stay Ops[1], Ops[0].
Provide the generalization below to any permutation when the assert can be dropped, i.e., in a separate patch which handles this TODO?


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4917
+        SmallVector<Value *, 4> ReorderedOps;
+        ReorderedOps.reserve(Ops.size());
+        for (const unsigned Idx : Order)
----------------
Provide Ops.size() as operand to constructor of ReorderedOps (multiple similar occurrences).


================
Comment at: test/Transforms/SLPVectorizer/X86/external_user_jumbled_load.ll:24
 ; CHECK-NEXT:    [[TMP13:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[TMP12]], i32 3
 ; CHECK-NEXT:    store <4 x i32> [[TMP13]], <4 x i32>* [[SINK:%.*]]
 ; CHECK-NEXT:    ret void
----------------
This redundant sequence of extractions from REORDER_SHUFFLE and insertions into TMP13 is hopefully eliminated later. Is the cost model ignoring it, or can we avoid generating it? Would be good to have the test CHECK the cost.


Repository:
  rL LLVM

https://reviews.llvm.org/D43776





More information about the llvm-commits mailing list