[PATCH] D154264: [LV] Skip VFs < iterations remaining for epilogue vectorization.
Ayal Zaks via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Jul 2 12:45:22 PDT 2023
Ayal added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5679
return {ForcedEC, 0, 0};
else {
LLVM_DEBUG(dbgs() << "LEV: Epilogue vectorization forced factor is not "
----------------
nit (unrelated): avoid else after return.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5709
- for (auto &NextVF : ProfitableVFs)
+ ScalarEvolution &SE = *PSE.getSE();
+ const SCEV *TC =
----------------
nit: can set `Type *TCType = Legal->getWidestInductionType();` and use it below.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5722
+ continue;
+
if (((!NextVF.Width.isScalable() && MainLoopVF.isScalable() &&
----------------
Worth early-continuing first if (a) !hasPlanWithVF, then if (b) NextVF >= EstimatedRuntimeVF, and last if (c) NextVF >= RemainingIterations?
Note that for IC==1 and non scalable VF's, check (c) subsumes check (b).
Can the checks below be simplified, given that EstimatedRuntimeVF = MainLoopVF if !MainLoopVF.isScalable()?
Instead of checking if Result.Width.isScalar() better check if Result is still uninitialized, i.e., == VectorizationFactor::Disabled()? Or teach isMoreProfitable() to prefer any computed cost over an uncomputed one.
================
Comment at: llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll:866
; AVX512-NEXT: [[TMP8:%.*]] = add nuw nsw i64 [[TMP7]], 8
-; AVX512-NEXT: [[UGLYGEP:%.*]] = getelementptr i8, ptr [[DEST:%.*]], i64 [[TMP8]]
+; AVX512-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DEST:%.*]], i64 [[TMP8]]
; AVX512-NEXT: [[TMP9:%.*]] = shl nuw i64 [[TMP6]], 2
----------------
nit: these UGLY-to-SCEV changes are unrelated.
================
Comment at: llvm/test/Transforms/LoopVectorize/X86/limit-vf-by-tripcount.ll:7
; TODO: Make sure selected VF for the epilog loop doesn't exceed remaining TC.
define void @test_tc_17_no_epilogue_vectorization(ptr noalias %src, ptr noalias %dst) #0 {
----------------
This patch addresses this TODO?
================
Comment at: llvm/test/Transforms/LoopVectorize/X86/pr42674.ll:8
; a VF=64,UF=4 loop, but the scalar trip count is only 128 so
; the vector loop was dead code leaving only a scalar remainder.
define zeroext i8 @sum() {
----------------
Note that the change in generated code for this test appears to be **indirectly** related to fixing the VF selected for the **epilog** loop.
This loop is vectorized with VF=64, UF=2 both w/ and w/o the patch, yet w/o the patch its epilog loop is also vectorized (with VF=32), which is cleaned up by instcombine and simplifycfg, but the main loop itself remains. Now the epilog loop is not vectorized and the main single-vectorized-iteration loop gets cleaned up.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D154264/new/
https://reviews.llvm.org/D154264
More information about the llvm-commits
mailing list