[PATCH] D37507: Fix maximum legal VF calculation

Sun Sep 10 06:41:48 PDT 2017

alonkom added inline comments.

================
Comment at: include/llvm/Analysis/LoopAccessAnalysis.h:205
+  uint64_t getMaxSafeRegisterWidth() const { return MaxSafeRegisterWidth; }
+
   /// \brief In same cases when the dependency check fails we can still
----------------
Ayal wrote:
> Should MaxSafeRegisterWidth replace MaxSafeDepDistBytes, or are both needed?
I think we need both. MaxSafeDepDistBytes is used in a few other LAA functions, and is correct.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6363
-  // computation because the maximum distance computed by LAA may not involve
-  // any of the interleaved accesses.
   if (Legal->getMaxSafeDepDistBytes() != -1U)
----------------
Ayal wrote:
> This comment about being fairly conservative (i.e., could be room for improvement) still holds, right?
We are actually now dropping this over-conservativeness, which was also not conservative enough.   

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6366
-    MaxSafeDepDist =
-        Legal->getMaxSafeDepDistBytes() * 8 / Legal->getMaxInterleaveFactor();

----------------
Ayal wrote:
> So another way of fixing this could be to have Legal record the MaxStride, instead of asking it for the MaxInterleaveFactor here?
yes, this was one of the possible solutions we thought of, but it is less accurate. 

Furthermore we can also drop getMaxInterleaveFactor() as it becomes dead.

================
Comment at: test/Transforms/LoopVectorize/X86/pr34283-1.ll:1
+; REQUIRES: asserts
+; RUN: opt -S -loop-vectorize -mcpu=skx -debug-only=loop-vectorize < %s  2>&1 | FileCheck %s
----------------
Ayal wrote:
> No need to require asserts here, right?
> 
> Better place the test say in memdep.ll, rather than opening a new file? Following TestingGuide.html#writing-new-regression-tests. In any case, place both tests in same file, using the PR number in the function names instead of foo.
The asserts are needed for checking the debug output, but we can check the IR instead, as in the second test.

================
Comment at: test/Transforms/LoopVectorize/X86/pr34283-1.ll:25
+; CHECK-LABEL: foo
+; CHECK: LV: The Widest register is: 192 bits.
+; MaxVF in loop is 6, therefore minmum required register is 6 * sizeof(int) * 8 = 192 bits
----------------
Ayal wrote:
> Can we also/instead check the runtime guard that is generated, perhaps where M is not a compile-time constant?
This fix involves compile time distances that affect max VF; no runtime guards involved.

================
Comment at: test/Transforms/LoopVectorize/X86/pr34283-2.ll:25
+; CHECK: <4 x i32>
+; CHECK-NOT: <8 x i32>
+
----------------
Ayal wrote:
> This is a bit fragile, as it's relying on the cost-model to choose exactly VF=4. Perhaps better to simply CHECK for a vector whose VF is in the range 2-7, and no vectors of VF 8 or larger?
Can force-vectorization-width=4  and CHECK that it vectorized, and force-vectorization-width=8  and CHECK that it did not.

https://reviews.llvm.org/D37507