[PATCH] D37507: Fix maximum legal VF calculation
Ayal Zaks via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Sep 10 00:40:58 PDT 2017
Ayal added inline comments.
================
Comment at: include/llvm/Analysis/LoopAccessAnalysis.h:205
+ uint64_t getMaxSafeRegisterWidth() const { return MaxSafeRegisterWidth; }
+
/// \brief In same cases when the dependency check fails we can still
----------------
Should MaxSafeRegisterWidth replace MaxSafeDepDistBytes, or are both needed?
================
Comment at: include/llvm/Analysis/LoopAccessAnalysis.h:265
+ /// The size of the element is taken from the memory access that causes the
+ /// worse restriction.
+ uint64_t MaxSafeRegisterWidth;
----------------
> that causes the worse[worst] restriction
>> that is most restrictive.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6226
+ DEBUG(dbgs() << "LV: Invalidate candidate interleaved group due to "
+ "interleaved store group with gaps.\n");
releaseGroup(Group);
----------------
Can simplify: "LV: Invalidate candidate interleaved store group due to gaps.\n"
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6363
- // computation because the maximum distance computed by LAA may not involve
- // any of the interleaved accesses.
if (Legal->getMaxSafeDepDistBytes() != -1U)
----------------
This comment about being fairly conservative (i.e., could be room for improvement) still holds, right?
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6366
- MaxSafeDepDist =
- Legal->getMaxSafeDepDistBytes() * 8 / Legal->getMaxInterleaveFactor();
----------------
So another way of fixing this could be to have Legal record the MaxStride, instead of asking it for the MaxInterleaveFactor here?
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6367
+ // It is computed by MaxVF * sizeOf(type) * 8, where type is taken from
+ // the memory accesses that incurs the largest restriction (involved in the
+ // smallest dependence distance).
----------------
> that incurs the largest restriction
that is most restrictive
================
Comment at: test/Transforms/LoopVectorize/X86/pr34283-1.ll:1
+; REQUIRES: asserts
+; RUN: opt -S -loop-vectorize -mcpu=skx -debug-only=loop-vectorize < %s 2>&1 | FileCheck %s
----------------
No need to require asserts here, right?
Better place the test say in memdep.ll, rather than opening a new file? Following TestingGuide.html#writing-new-regression-tests. In any case, place both tests in same file, using the PR number in the function names instead of foo.
================
Comment at: test/Transforms/LoopVectorize/X86/pr34283-1.ll:25
+; CHECK-LABEL: foo
+; CHECK: LV: The Widest register is: 192 bits.
+; MaxVF in loop is 6, therefore minmum required register is 6 * sizeof(int) * 8 = 192 bits
----------------
Can we also/instead check the runtime guard that is generated, perhaps where M is not a compile-time constant?
================
Comment at: test/Transforms/LoopVectorize/X86/pr34283-1.ll:26
+; CHECK: LV: The Widest register is: 192 bits.
+; MaxVF in loop is 6, therefore minmum required register is 6 * sizeof(int) * 8 = 192 bits
+
----------------
min[i]mum
================
Comment at: test/Transforms/LoopVectorize/X86/pr34283-2.ll:1
+; RUN: opt -S -loop-vectorize -mcpu=skx < %s 2>&1 | FileCheck %s
+
----------------
The "2>&1" is not needed here.
================
Comment at: test/Transforms/LoopVectorize/X86/pr34283-2.ll:25
+; CHECK: <4 x i32>
+; CHECK-NOT: <8 x i32>
+
----------------
This is a bit fragile, as it's relying on the cost-model to choose exactly VF=4. Perhaps better to simply CHECK for a vector whose VF is in the range 2-7, and no vectors of VF 8 or larger?
https://reviews.llvm.org/D37507
More information about the llvm-commits
mailing list