[llvm] [LV] Add initial legality checks for ee loops with stores (PR #145663)
David Sherwood via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 11 04:10:32 PDT 2025
================
@@ -530,20 +538,89 @@ class LoopVectorizationLegality {
/// Returns true if this is an early exit loop that can be vectorized.
/// Currently, a loop with an uncountable early exit is considered
/// vectorizable if:
- /// 1. There are no writes to memory in the loop.
+ /// 1. Writes to memory do not form a dependence with any load used as
+ /// part of the uncounted exit condition.
/// 2. The loop has only one early uncountable exit
/// 3. The early exit block dominates the latch block.
/// 4. The latch block has an exact exit count.
/// 5. The loop does not contain reductions or recurrences.
/// 6. We can prove at compile-time that loops will not contain faulting
- /// loads.
+ /// loads, or that any faulting loads would also occur in a purely
+ /// scalar loop.
/// 7. It is safe to speculatively execute instructions such as divide or
- /// call instructions.
+ /// call instructions.
/// The list above is not based on theoretical limitations of vectorization,
/// but simply a statement that more work is needed to support these
/// additional cases safely.
bool isVectorizableEarlyExitLoop();
+ /// When vectorizing an early exit loop containing side effects, we need to
+ /// determine whether an uncounted exit will be taken before any operation
+ /// that has side effects.
+ ///
+ /// Consider a loop like the following:
+ /// for (int i = 0; i < N; ++i) {
+ /// a[i] = b[i];
+ /// if (c[i] == 0)
+ /// break;
+ /// }
+ ///
+ /// We have both a load and a store operation occurring before the condition
+ /// is checked for early termination. We could potentially restrict
+ /// vectorization to cases where we know all addresses are guaranteed to be
+ /// dereferenceable, which would allow the load before the condition check to
+ /// be vectorized.
+ ///
+ /// The store, however, should not execute across all lanes if early
+ /// termination occurs before the end of the vector. We must only store to the
+ /// locations that would have been stored to by a scalar loop. So we need to
+ /// know what the result of 'c[i] == 0' is before performing the vector store,
+ /// with or without masking.
+ ///
+ /// We can either do this by moving the condition load to the top of the
+ /// vector body and using the comparison to create masks for other operations
+ /// in the loop, or by looking ahead one vector iteration and bailing out to
+ /// the scalar loop if an exit would occur.
+ ///
+ /// Using the latter approach (applicable to more targets), we need to hoist
+ /// the first load (of c[0]) out of the loop then rotate the load within the
+ /// loop to the next iteration, remembering to adjust the vector trip count.
+ /// Something like the following:
+ ///
+ /// vec.ph:
+ /// %ci.0 = load <4 x i32>, ptr %c
+ /// %cmp.0 = icmp eq <4 x i32> %ci.0, zeroinitializer
+ /// %any.of.0 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> %cmp.0)
+ /// br i1 %any.of.0, label %scalar.ph, label %vec.body
+ /// vec.body:
+ /// %iv = phi...
+ /// phi for c[i] if used elsewhere in the loop...
+ /// other operations in the loop...
+ /// %iv.next = add i64 %iv, 4
+ /// %addr.next = getelementptr i32, ptr %c, i64 %iv.next
+ /// %ci.next = load <4 x i32>, ptr %addr.next
+ /// %cmp.next = icmp eq <4 x i32> %ci.next, zeroinitializer
+ /// %any.of.next = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> %cmp.next)
+ /// iv.next compared with shortened vector tripcount...
+ /// uncounted condition combined with counted condition...
+ /// br...
+ ///
+ /// Doing this means the last few iterations will always be performed by a
+ /// scalar loop regardless of which exit is taken, and so vector iterations
+ /// will never execute a memory operation to a location that the scalar loop
+ /// would not have.
+ ///
+ /// This means we must ensure that it is safe to move the load for 'c[i]'
+ /// before other memory operations (or any other observable side effects) in
+ /// the loop.
----------------
david-arm wrote:
This comment looks really helpful, thanks!
Is it worth adding as well that `c[i]` is not permitted to have more than one use for now, because an outside use complicates things since you now have to use `phi for c[i]` instead of `c[i]` itself.
https://github.com/llvm/llvm-project/pull/145663
More information about the llvm-commits
mailing list