[PATCH] D38785: [LV/LAA] Avoid specializing a loop for stride=1 when this predicate implies a single-iteration loop

Wed Nov 1 17:47:05 PDT 2017

Ayal added a comment.

Indeed, a loop with an iteration count smaller than VF is definitely not worth vectorizing. An interesting profitability issue is to decide how many iterations past VF suffice to amortize vectorization overheads. In any case, this single/no iteration case looks like a no-brainer and realistic case - traversing a column of an NxN matrix.

================
Comment at: llvm/lib/Analysis/LoopAccessAnalysis.cpp:2153
+  // If (1) and (2) coexist, it means that 1>=TC, in which case we avoid
+  // adding the predicate and bail out.
+  //
----------------
Simplify the above explanation. Suffice to say something like the following:

//Avoid adding the "Stride == 1" predicate when we know that Stride >= Trip-Count. Such a predicate will effectively optimize a single or no iteration loop, as Trip-Count <= Stride == 1.//

================
Comment at: llvm/lib/Analysis/LoopAccessAnalysis.cpp:2177
+    CastedBECount = SE->getZeroExtendExpr(BETakenCount, StrideExpr->getType());
+  const SCEV *Minus = SE->getMinusSCEV(CastedStride, CastedBECount);
+  if (SE->isKnownPositive(Minus)) {
----------------
`Minus` >> `StrideMinusBETaken`?

================
Comment at: llvm/lib/Analysis/LoopAccessAnalysis.cpp:2184
+  }  
+
   SymbolicStrides[Ptr] = Stride;
----------------
Can report success here, as in the original message above:

```
DEBUG(dbgs() << "LAA: Found a strided access that we can version");
```

================
Comment at: llvm/test/Transforms/LoopVectorize/pr34681.ll:13
+;       for(unsigned int k=0;k<N;k++) {
+;         tmp+=(int)B[k*N+j];
+;       }
----------------
Would

```
  tmp += B[k*N];
```

suffice? I.e., the cast to `int` and offset of `j` seem redundant, albeit do no harm.

================
Comment at: llvm/test/Transforms/LoopVectorize/pr34681.ll:16
+;
+; We check here that the following runtine scev guard for Stride==1 is not generated:
+; vector.scevcheck:
----------------
"runtine" >> "runtime"

Suggest to emphasize: "is *not* generated"

================
Comment at: llvm/test/Transforms/LoopVectorize/version-mem-access.ll:63
 ; CHECK-LABEL: fn1
 ; CHECK: load <2 x double>

----------------
Just noting for completeness that this test-case originally used the symbolic stride also as the trip count. Separating them below in order to continue to vectorize the loop.

https://reviews.llvm.org/D38785