[PATCH] D37507: Fix maximum legal VF calculation

Sun Sep 10 00:40:58 PDT 2017

Ayal added inline comments.

================
Comment at: include/llvm/Analysis/LoopAccessAnalysis.h:205
+  uint64_t getMaxSafeRegisterWidth() const { return MaxSafeRegisterWidth; }
+
   /// \brief In same cases when the dependency check fails we can still
----------------
Should MaxSafeRegisterWidth replace MaxSafeDepDistBytes, or are both needed?

================
Comment at: include/llvm/Analysis/LoopAccessAnalysis.h:265
+  /// The size of the element is taken from the memory access that causes the
+  /// worse restriction.
+  uint64_t MaxSafeRegisterWidth;
----------------
> that causes the worse[worst] restriction
 >> that is most restrictive.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6226
+      DEBUG(dbgs() << "LV: Invalidate candidate interleaved group due to "
+                      "interleaved store group with gaps.\n");
       releaseGroup(Group);
----------------
Can simplify: "LV: Invalidate candidate interleaved store group due to gaps.\n"

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6363
-  // computation because the maximum distance computed by LAA may not involve
-  // any of the interleaved accesses.
   if (Legal->getMaxSafeDepDistBytes() != -1U)
----------------
This comment about being fairly conservative (i.e., could be room for improvement) still holds, right?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6366
-    MaxSafeDepDist =
-        Legal->getMaxSafeDepDistBytes() * 8 / Legal->getMaxInterleaveFactor();

----------------
So another way of fixing this could be to have Legal record the MaxStride, instead of asking it for the MaxInterleaveFactor here?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6367
+  // It is computed by MaxVF * sizeOf(type) * 8, where type is taken from
+  // the memory accesses that incurs the largest restriction (involved in the
+  // smallest dependence distance).
----------------
> that incurs the largest restriction

that is most restrictive

================
Comment at: test/Transforms/LoopVectorize/X86/pr34283-1.ll:1
+; REQUIRES: asserts
+; RUN: opt -S -loop-vectorize -mcpu=skx -debug-only=loop-vectorize < %s  2>&1 | FileCheck %s
----------------
No need to require asserts here, right?

Better place the test say in memdep.ll, rather than opening a new file? Following TestingGuide.html#writing-new-regression-tests. In any case, place both tests in same file, using the PR number in the function names instead of foo.

================
Comment at: test/Transforms/LoopVectorize/X86/pr34283-1.ll:25
+; CHECK-LABEL: foo
+; CHECK: LV: The Widest register is: 192 bits.
+; MaxVF in loop is 6, therefore minmum required register is 6 * sizeof(int) * 8 = 192 bits
----------------
Can we also/instead check the runtime guard that is generated, perhaps where M is not a compile-time constant?

================
Comment at: test/Transforms/LoopVectorize/X86/pr34283-1.ll:26
+; CHECK: LV: The Widest register is: 192 bits.
+; MaxVF in loop is 6, therefore minmum required register is 6 * sizeof(int) * 8 = 192 bits
+
----------------
min[i]mum

================
Comment at: test/Transforms/LoopVectorize/X86/pr34283-2.ll:1
+; RUN: opt -S -loop-vectorize -mcpu=skx < %s  2>&1 | FileCheck %s
+
----------------
The "2>&1" is not needed here.

================
Comment at: test/Transforms/LoopVectorize/X86/pr34283-2.ll:25
+; CHECK: <4 x i32>
+; CHECK-NOT: <8 x i32>
+
----------------
This is a bit fragile, as it's relying on the cost-model to choose exactly VF=4. Perhaps better to simply CHECK for a vector whose VF is in the range 2-7, and no vectors of VF 8 or larger?

https://reviews.llvm.org/D37507