[PATCH] D152576: [LV] Avoid vectorization if wrap predicates are always false.

Wed Jun 14 13:03:19 PDT 2023

Ayal added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1921
+                                   ScalarEvolution &SE, unsigned VFxUF) {
+  // Check if \p WrapPred overflows for \p ExitCount.
+  auto ProveFalse = [&SE](const SCEVWrapPredicate *WrapPred,
----------------
\p's should appear under ///, and refer to actual parameter names?

(Following offline discussion:)
The idea is the check if any WrapPredicate fires across all iterations of the vector loop, using its trip count if known, otherwise using VFxUF as a lower-bound of trip-count for reaching the vector loop. Suffice to check once - for trip count if constant or else for VFxUF.

Instead of building a SCEV for double the type size, evaluating both SCEVs at last iteration, and comparing to prove wrapping occurred, suffice to deduce the first iteration when wrap will occur, given constant step and constant (or lower-bound of) start, and size of type? Then compare this iteration with the trip-count if constant or VFxUF lower-bound if not. This could also allow vectorizing a subset of iterations until first wrap, followed by scalar remainder (or strip-mining the loop).

Wrapping may be tolerated if it occurs on vector boundaries, considering vector loads, stores, and interleave groups. This requires alignment analysis. Unaligned accesses could tolerate wrapping by vectorizing into gathers or scatters.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7630
   assert(OrigLoop->isInnermost() && "Inner loop expected.");
+
   CM.collectValuesToIgnore();
----------------
nit: unrelated new line.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll:161
 ; CHECK:       vec.epilog.vector.body:
-; CHECK-NEXT:    [[OFFSET_IDX:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT11:%.*]], [[VEC_EPILOG_VECTOR_BODY]] ]
+; CHECK-NEXT:    [[INDEX7:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT11:%.*]], [[VEC_EPILOG_VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[VEC_IND8:%.*]] = phi <2 x i64> [ [[INDUCTION]], [[VEC_EPILOG_PH]] ], [ [[VEC_IND_NEXT10:%.*]], [[VEC_EPILOG_VECTOR_BODY]] ]
----------------
nit: these changes from OFFSET_IDX to INDEX are unneeded?

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll:399
+; CHECK-NEXT:    [[IV:%.*]] = phi i8 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
 ; CHECK-NEXT:    [[IV_EXT:%.*]] = zext i8 [[IV]] to i64
+; CHECK-NEXT:    [[ARRAYIDX1449:%.*]] = getelementptr inbounds [6 x i8], ptr [[DST:%.*]], i64 0, i64 [[IV_EXT]]
----------------
An i8 IV<0,+,1> will surely wrap across 10,000 iterations.
But seems like an infinite loop - how can %iv.next.ext ever be equal to 10,000?

================
Comment at: llvm/test/Transforms/LoopVectorize/runtime-check-small-clamped-bounds.ll:8
 ;        and runtime checks are emitted to ensure that. The clamped indices do
 ;        wrap, so the vector loops are dead at the moment. But it is still
 ;        possible to compute the bounds of the accesses and generate proper
----------------
Fix comment.

Worth also adding tests where wrapping does not occur within VF*UF or constant trip count, and vectorization is not aborted?

================
Comment at: llvm/test/Transforms/LoopVectorize/runtime-check-small-clamped-bounds.ll:19
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
 ; CHECK-NEXT:    [[CLAMPED_INDEX:%.*]] = urem i32 [[IV]], 4
+; CHECK-NEXT:    [[GEP_A:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i32 [[CLAMPED_INDEX]]
----------------
Must this IV<0,+,1> % 4 wrap for VF=4 and unknown trip-count N? The first vector iteration would still work?

================
Comment at: llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll:99-100
 ; CHECK-NEXT:    store ptr [[P_0]], ptr [[ARRAYIDX]], align 4
 ; CHECK-NEXT:    [[INC]] = add i32 [[IV]], 1
 ; CHECK-NEXT:    [[TOBOOL_NOT:%.*]] = icmp eq i32 [[IV]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT]], label [[FOR_END:%.*]], label [[FOR_COND]]
----------------
IV <30,+,1> wraps (as unsigned?) but immediately exits as soon as it reaching 0, so effectively iterates w/o wrapping?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152576/new/

https://reviews.llvm.org/D152576