[llvm] [LoopVectorize] Add support for vectorisation of simple early exit loops (PR #88385)

Thu Apr 11 05:46:52 PDT 2024

llvmbot wrote:



@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-llvm-ir

Author: David Sherwood (david-arm)

<details>
<summary>Changes</summary>

This patch adds support for vectorisation of a simple class of loops that typically involves searching for something, i.e.

```
  for (int i = 0; i < n; i++) {
    if (p[i] == val)
      return i;
  }
  return n;
```

or

```
  for (int i = 0; i < n; i++) {
    if (p1[i] != p2[i])
      return i;
  }
  return n;
```

In this initial commit we only vectorise loops with the following criteria:

1. There are no stores in the loop.
2. The loop must have only one early exit like those shown in the above example. I have referred to such exits as speculative early exits, to distinguish from existing support for early exits where the exit-not-taken count is known exactly at compile time.
2. The early exit block dominates the latch block.
3. There are no loads after the early exit block.
4. The loop must not contain reductions or recurrences. I don't see anything fundamental blocking vectorisation of such loops, but I just haven't done the work to support them yet.
5. We must be able to prove at compile-time that loops will not contain faulting loads.

For point 5 once this patch lands I intend to follow up by supporting some limited cases of faulting loops where we can version the loop based on pointer alignment. For example, it turns out in the SPEC2017 benchmark there is a std::find loop that we can vectorise provided we add SCEV checks for the initial pointer being aligned to a multiple of the VF. In practice, the pointer is regularly aligned to at least 32/64 bytes and since the VF is a power of 2, any vector loads <= 32/64 bytes in size will always fault on the first lane, following the same behaviour as the scalar loop. Given we already do such speculative versioning for loops with unknown strides, alignment-based versioning doesn't seem to be any worse at least for loops with only one load.

This patch makes use of the existing experimental_cttz_elems intrinsic that's required in the vectorised early exit block to determine the first lane that triggered the exit. This intrinsic has generic lowering support so it's guaranteed to work for all targets.

Tests have been added here:

  Transforms/LoopVectorize/AArch64/simple_early_exit.ll

---

Patch is 226.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88385.diff


18 Files Affected:

- (modified) llvm/include/llvm/Analysis/LoopAccessAnalysis.h (+36) 
- (modified) llvm/include/llvm/Analysis/ScalarEvolution.h (+33-3) 
- (modified) llvm/include/llvm/IR/IRBuilder.h (+7) 
- (modified) llvm/include/llvm/Support/GenericLoopInfo.h (+4) 
- (modified) llvm/include/llvm/Support/GenericLoopInfoImpl.h (+10) 
- (modified) llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h (+8-1) 
- (modified) llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h (+18) 
- (modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+180-9) 
- (modified) llvm/lib/Analysis/ScalarEvolution.cpp (+88-6) 
- (modified) llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp (+2-2) 
- (modified) llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp (+10) 
- (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+348-42) 
- (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+63-5) 
- (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+71-7) 
- (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+38-11) 
- (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+3-1) 
- (added) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+2544) 
- (modified) llvm/test/Transforms/LoopVectorize/control-flow.ll (+1-1) 


``````````diff

diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index e39c371b41ec5c..d79c53f490c927 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -587,6 +587,9 @@ class LoopAccessInfo {
   /// not legal to insert them.
   bool hasConvergentOp() const { return HasConvergentOp; }
 
+  /// Return true if the loop may fault due to memory accesses.
+  bool mayFault() const { return LoopMayFault; }
+
   const RuntimePointerChecking *getRuntimePointerChecking() const {
     return PtrRtChecking.get();
   }
@@ -608,6 +611,24 @@ class LoopAccessInfo {
   unsigned getNumStores() const { return NumStores; }
   unsigned getNumLoads() const { return NumLoads;}
 
+  /// Returns the block that exits early from the loop, if there is one.
+  /// Otherwise returns nullptr.
+  BasicBlock *getSpeculativeEarlyExitingBlock() const {
+    return SpeculativeEarlyExitingBB;
+  }
+
+  /// Returns the successor of the block that exits early from the loop, if
+  /// there is one. Otherwise returns nullptr.
+  BasicBlock *getSpeculativeEarlyExitBlock() const {
+    return SpeculativeEarlyExitBB;
+  }
+
+  /// Returns all blocks with a countable exit, i.e. the exit-not-taken count
+  /// is known exactly at compile time.
+  const SmallVector<BasicBlock *, 4> &getCountableEarlyExitingBlocks() const {
+    return CountableEarlyExitBlocks;
+  }
+
   /// The diagnostics report generated for the analysis.  E.g. why we
   /// couldn't analyze the loop.
   const OptimizationRemarkAnalysis *getReport() const { return Report.get(); }
@@ -659,6 +680,10 @@ class LoopAccessInfo {
   /// pass.
   bool canAnalyzeLoop();
 
+  /// Returns true if this is a supported early exit loop that we can analyze
+  /// in this pass.
+  bool isAnalyzableEarlyExitLoop();
+
   /// Save the analysis remark.
   ///
   /// LAA does not directly emits the remarks.  Instead it stores it which the
@@ -696,6 +721,17 @@ class LoopAccessInfo {
   /// Cache the result of analyzeLoop.
   bool CanVecMem = false;
   bool HasConvergentOp = false;
+  bool LoopMayFault = false;
+
+  /// Keeps track of the early-exiting block, if present.
+  BasicBlock *SpeculativeEarlyExitingBB = nullptr;
+
+  /// Keeps track of the successor of the early-exiting block, if present.
+  BasicBlock *SpeculativeEarlyExitBB = nullptr;
+
+  /// Keeps track of all the early exits with known or countable exit-not-taken
+  /// counts.
+  SmallVector<BasicBlock *, 4> CountableEarlyExitBlocks;
 
   /// Indicator that there are non vectorizable stores to a uniform address.
   bool HasDependenceInvolvingLoopInvariantAddress = false;
diff --git a/llvm/include/llvm/Analysis/ScalarEvolution.h b/llvm/include/llvm/Analysis/ScalarEvolution.h
index 5828cc156cc785..562deab8b4159e 100644
--- a/llvm/include/llvm/Analysis/ScalarEvolution.h
+++ b/llvm/include/llvm/Analysis/ScalarEvolution.h
@@ -892,9 +892,13 @@ class ScalarEvolution {
   /// Similar to getBackedgeTakenCount, except it will add a set of
   /// SCEV predicates to Predicates that are required to be true in order for
   /// the answer to be correct. Predicates can be checked with run-time
-  /// checks and can be used to perform loop versioning.
-  const SCEV *getPredicatedBackedgeTakenCount(const Loop *L,
-                                              SmallVector<const SCEVPredicate *, 4> &Predicates);
+  /// checks and can be used to perform loop versioning. If \p Speculative is
+  /// true, this will attempt to return the speculative backedge count for loops
+  /// with early exits. However, this is only possible if we can formulate an
+  /// exact expression for the backedge count from the latch block.
+  const SCEV *getPredicatedBackedgeTakenCount(
+      const Loop *L, SmallVector<const SCEVPredicate *, 4> &Predicates,
+      bool Speculative = false);
 
   /// When successful, this returns a SCEVConstant that is greater than or equal
   /// to (i.e. a "conservative over-approximation") of the value returend by
@@ -912,6 +916,12 @@ class ScalarEvolution {
     return getBackedgeTakenCount(L, SymbolicMaximum);
   }
 
+  /// Return all the exiting blocks in with exact exit counts.
+  void getExactExitingBlocks(const Loop *L,
+                             SmallVector<BasicBlock *, 4> *Blocks) {
+    getBackedgeTakenInfo(L).getExactExitingBlocks(L, this, Blocks);
+  }
+
   /// Return true if the backedge taken count is either the value returned by
   /// getConstantMaxBackedgeTakenCount or zero.
   bool isBackedgeTakenCountMaxOrZero(const Loop *L);
@@ -1534,6 +1544,16 @@ class ScalarEvolution {
     const SCEV *getExact(const Loop *L, ScalarEvolution *SE,
                          SmallVector<const SCEVPredicate *, 4> *Predicates = nullptr) const;
 
+    /// Similar to the above, except we permit unknown exit counts from
+    /// non-latch exit blocks. Any such early exit blocks must dominate the
+    /// latch and so the returned expression represents the speculative, or
+    /// maximum possible, *backedge-taken* count of the loop. If there is no
+    /// exact exit count for the latch this function returns
+    /// SCEVCouldNotCompute.
+    const SCEV *getSpeculative(
+        const Loop *L, ScalarEvolution *SE,
+        SmallVector<const SCEVPredicate *, 4> *Predicates = nullptr) const;
+
     /// Return the number of times this loop exit may fall through to the back
     /// edge, or SCEVCouldNotCompute. The loop is guaranteed not to exit via
     /// this block before this number of iterations, but may exit via another
@@ -1541,6 +1561,10 @@ class ScalarEvolution {
     const SCEV *getExact(const BasicBlock *ExitingBlock,
                          ScalarEvolution *SE) const;
 
+    /// Return all the exiting blocks in with exact exit counts.
+    void getExactExitingBlocks(const Loop *L, ScalarEvolution *SE,
+                               SmallVector<BasicBlock *, 4> *Blocks) const;
+
     /// Get the constant max backedge taken count for the loop.
     const SCEV *getConstantMax(ScalarEvolution *SE) const;
 
@@ -2316,6 +2340,9 @@ class PredicatedScalarEvolution {
   /// Get the (predicated) backedge count for the analyzed loop.
   const SCEV *getBackedgeTakenCount();
 
+  /// Get the (predicated) speculative backedge count for the analyzed loop.
+  const SCEV *getSpeculativeBackedgeTakenCount();
+
   /// Adds a new predicate.
   void addPredicate(const SCEVPredicate &Pred);
 
@@ -2384,6 +2411,9 @@ class PredicatedScalarEvolution {
 
   /// The backedge taken count.
   const SCEV *BackedgeCount = nullptr;
+
+  /// The speculative backedge taken count.
+  const SCEV *SpeculativeBackedgeCount = nullptr;
 };
 
 template <> struct DenseMapInfo<ScalarEvolution::FoldID> {
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h
index f381273c46cfb8..81cf8a6f5d4793 100644
--- a/llvm/include/llvm/IR/IRBuilder.h
+++ b/llvm/include/llvm/IR/IRBuilder.h
@@ -2503,6 +2503,13 @@ class IRBuilderBase {
     return CreateShuffleVector(V, PoisonValue::get(V->getType()), Mask, Name);
   }
 
+  Value *CreateCountTrailingZeroElems(Type *ResTy, Value *Mask,
+                                      const Twine &Name = "") {
+    return CreateIntrinsic(
+        Intrinsic::experimental_cttz_elts, {ResTy, Mask->getType()},
+        {Mask, getInt1(/*ZeroIsPoison=*/true)}, nullptr, Name);
+  }
+
   Value *CreateExtractValue(Value *Agg, ArrayRef<unsigned> Idxs,
                             const Twine &Name = "") {
     if (auto *V = Folder.FoldExtractValue(Agg, Idxs))
diff --git a/llvm/include/llvm/Support/GenericLoopInfo.h b/llvm/include/llvm/Support/GenericLoopInfo.h
index d560ca648132c9..83cacf864089cc 100644
--- a/llvm/include/llvm/Support/GenericLoopInfo.h
+++ b/llvm/include/llvm/Support/GenericLoopInfo.h
@@ -294,6 +294,10 @@ template <class BlockT, class LoopT> class LoopBase {
   /// Otherwise return null.
   BlockT *getUniqueExitBlock() const;
 
+  /// Return the exit block for the latch if one exists. This function assumes
+  /// the loop has a latch.
+  BlockT *getLatchExitBlock() const;
+
   /// Return true if this loop does not have any exit blocks.
   bool hasNoExitBlocks() const;
 
diff --git a/llvm/include/llvm/Support/GenericLoopInfoImpl.h b/llvm/include/llvm/Support/GenericLoopInfoImpl.h
index 1e0d0ee446fc41..3beb3e538398ef 100644
--- a/llvm/include/llvm/Support/GenericLoopInfoImpl.h
+++ b/llvm/include/llvm/Support/GenericLoopInfoImpl.h
@@ -159,6 +159,16 @@ BlockT *LoopBase<BlockT, LoopT>::getUniqueExitBlock() const {
   return getExitBlockHelper(this, true).first;
 }
 
+template <class BlockT, class LoopT>
+BlockT *LoopBase<BlockT, LoopT>::getLatchExitBlock() const {
+  BlockT *Latch = getLoopLatch();
+  assert(Latch && "Latch block must exists");
+  for (BlockT *Successor : children<BlockT *>(Latch))
+    if (!contains(Successor))
+      return Successor;
+  return nullptr;
+}
+
 /// getExitEdges - Return all pairs of (_inside_block_,_outside_block_).
 template <class BlockT, class LoopT>
 void LoopBase<BlockT, LoopT>::getExitEdges(
diff --git a/llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h b/llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h
index 62c1e15a9a60e1..05850f864d042a 100644
--- a/llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h
+++ b/llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h
@@ -124,6 +124,11 @@ class SCEVExpander : public SCEVVisitor<SCEVExpander, Value *> {
   /// "expanded" form.
   bool LSRMode;
 
+  /// If the loop has an early exit we may have to use the speculative backedge
+  /// count, since the normal backedge count function is unable to compute a
+  /// SCEV expression.
+  bool UseSpeculativeBackedgeCount;
+
   typedef IRBuilder<InstSimplifyFolder, IRBuilderCallbackInserter> BuilderType;
   BuilderType Builder;
 
@@ -176,10 +181,12 @@ class SCEVExpander : public SCEVVisitor<SCEVExpander, Value *> {
 public:
   /// Construct a SCEVExpander in "canonical" mode.
   explicit SCEVExpander(ScalarEvolution &se, const DataLayout &DL,
-                        const char *name, bool PreserveLCSSA = true)
+                        const char *name, bool PreserveLCSSA = true,
+                        bool UseSpeculativeBackedgeCount = false)
       : SE(se), DL(DL), IVName(name), PreserveLCSSA(PreserveLCSSA),
         IVIncInsertLoop(nullptr), IVIncInsertPos(nullptr), CanonicalMode(true),
         LSRMode(false),
+        UseSpeculativeBackedgeCount(UseSpeculativeBackedgeCount),
         Builder(se.getContext(), InstSimplifyFolder(DL),
                 IRBuilderCallbackInserter(
                     [this](Instruction *I) { rememberInstruction(I); })) {
diff --git a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
index a509ebf6a7e1b3..20a53abeb2e5cc 100644
--- a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
+++ b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
@@ -374,6 +374,24 @@ class LoopVectorizationLegality {
     return LAI->getDepChecker().getMaxSafeVectorWidthInBits();
   }
 
+  /// Returns true if the loop has a early exit with a exact backedge
+  /// count that is speculative.
+  bool hasSpeculativeEarlyExit() const {
+    return LAI && LAI->getSpeculativeEarlyExitingBlock();
+  }
+
+  /// Returns the early exiting block in a loop with a speculative backedge
+  /// count.
+  BasicBlock *getSpeculativeEarlyExitingBlock() const {
+    return LAI->getSpeculativeEarlyExitingBlock();
+  }
+
+  /// Returns the destination of an early exiting block in a loop with a
+  /// speculative backedge count.
+  BasicBlock *getSpeculativeEarlyExitBlock() const {
+    return LAI->getSpeculativeEarlyExitBlock();
+  }
+
   /// Returns true if vector representation of the instruction \p I
   /// requires mask.
   bool isMaskRequired(const Instruction *I) const {
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 3bfc9700a14559..32e5816644310a 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -730,6 +730,9 @@ class AccessAnalysis {
     return UnderlyingObjects;
   }
 
+  /// Returns true if we cannot prove the loop will not fault.
+  bool mayFault();
+
 private:
   typedef MapVector<MemAccessInfo, SmallSetVector<Type *, 1>> PtrAccessMap;
 
@@ -1281,6 +1284,63 @@ bool AccessAnalysis::canCheckPtrAtRT(RuntimePointerChecking &RtCheck,
   return CanDoRTIfNeeded;
 }
 
+bool AccessAnalysis::mayFault() {
+  auto &DL = TheLoop->getHeader()->getModule()->getDataLayout();
+  for (auto &UO : UnderlyingObjects) {
+    // TODO: For now if we encounter more than one underlying object we just
+    // assume it could fault. However, with more analysis it's possible to look
+    // at all of them and calculate a common range of permitted GEP indices.
+    if (UO.second.size() != 1)
+      return true;
+
+    // For now only the simplest cases are permitted, but this could be
+    // extended further.
+    auto *GEP = dyn_cast<GetElementPtrInst>(UO.first);
+    if (!GEP || GEP->getPointerOperand() != UO.second[0] ||
+        GEP->getNumIndices() != 1)
+      return true;
+
+    // Verify pointer accessed within the loop always falls within the bounds
+    // of the underlying object, but first it's necessary to determine the
+    // object size.
+
+    auto GetKnownObjSize = [&](const Value *Obj) -> uint64_t {
+      // TODO: We should also be able to support global variables too.
+      if (auto *AllocaObj = dyn_cast<AllocaInst>(Obj)) {
+        if (TheLoop->isLoopInvariant(AllocaObj))
+          if (std::optional<TypeSize> AllocaSize =
+                  AllocaObj->getAllocationSize(DL))
+            return !AllocaSize->isScalable() ? AllocaSize->getFixedValue() : 0;
+      } else if (auto *ArgObj = dyn_cast<Argument>(Obj))
+        return ArgObj->getDereferenceableBytes();
+      return 0;
+    };
+
+    uint64_t ObjSize = GetKnownObjSize(UO.second[0]);
+    if (!ObjSize)
+      return true;
+
+    Value *GEPInd = GEP->getOperand(1);
+    const SCEV *IndScev = PSE.getSCEV(GEPInd);
+    if (!isa<SCEVAddRecExpr>(IndScev))
+      return true;
+
+    // Calculate the maximum number of addressable elements in the object.
+    uint64_t ElemSize = GEP->getSourceElementType()->getScalarSizeInBits() / 8;
+    uint64_t MaxNumElems = ObjSize / ElemSize;
+
+    const SCEV *MinScev = PSE.getSE()->getConstant(GEPInd->getType(), 0);
+    const SCEV *MaxScev =
+        PSE.getSE()->getConstant(GEPInd->getType(), MaxNumElems);
+    if (!PSE.getSE()->isKnownOnEveryIteration(
+            ICmpInst::ICMP_SGE, cast<SCEVAddRecExpr>(IndScev), MinScev) ||
+        !PSE.getSE()->isKnownOnEveryIteration(
+            ICmpInst::ICMP_SLT, cast<SCEVAddRecExpr>(IndScev), MaxScev))
+      return true;
+  }
+  return false;
+}
+
 void AccessAnalysis::processMemAccesses() {
   // We process the set twice: first we process read-write pointers, last we
   // process read-only pointers. This allows us to skip dependence tests for
@@ -2292,6 +2352,73 @@ void MemoryDepChecker::Dependence::print(
   OS.indent(Depth + 2) << *Instrs[Destination] << "\n";
 }
 
+bool LoopAccessInfo::isAnalyzableEarlyExitLoop() {
+  // At least one of the exiting blocks must be the latch.
+  BasicBlock *LatchBB = TheLoop->getLoopLatch();
+  if (!LatchBB)
+    return false;
+
+  SmallVector<BasicBlock *, 8> ExitingBlocks;
+  TheLoop->getExitingBlocks(ExitingBlocks);
+
+  // This is definitely not an early exit loop.
+  if (ExitingBlocks.size() < 2)
+    return false;
+
+  SmallVector<BasicBlock *, 4> ExactExitingBlocks;
+  PSE->getSE()->getExactExitingBlocks(TheLoop, &ExactExitingBlocks);
+
+  // We only support one speculative early exit.
+  if ((ExitingBlocks.size() - ExactExitingBlocks.size()) > 1)
+    return false;
+
+  // There could be multiple exiting blocks with an exact exit-not-taken
+  // count. Find the speculative early exit block, i.e. the one with an
+  // unknown count.
+  BasicBlock *TmpBB = nullptr;
+  for (BasicBlock *BB1 : ExitingBlocks) {
+    bool Found = false;
+    for (BasicBlock *BB2 : ExactExitingBlocks)
+      if (BB1 == BB2) {
+        Found = true;
+        break;
+      }
+    if (!Found) {
+      TmpBB = BB1;
+      break;
+    }
+  }
+  assert(TmpBB && "Expected to find speculative early exiting block");
+
+  // For now, let's keep things simple by ensuring the latch block only has
+  // the exiting block as a predecessor.
+  BasicBlock *LatchPredBB = LatchBB->getUniquePredecessor();
+  if (!LatchPredBB || LatchPredBB != TmpBB)
+    return false;
+
+  LLVM_DEBUG(
+      dbgs()
+      << "LAA: Found an early exit. Retrying with speculative exit count.\n");
+  const SCEV *SpecExitCount = PSE->getSpeculativeBackedgeTakenCount();
+  if (isa<SCEVCouldNotCompute>(SpecExitCount))
+    return false;
+
+  LLVM_DEBUG(dbgs() << "LAA: Found speculative backedge taken count: "
+                    << *SpecExitCount << '\n');
+  SpeculativeEarlyExitingBB = TmpBB;
+
+  for (BasicBlock *BB : successors(SpeculativeEarlyExitingBB))
+    if (BB != LatchBB) {
+      SpeculativeEarlyExitBB = BB;
+      break;
+    }
+  assert(SpeculativeEarlyExitBB &&
+         "Expected to find speculative early exit block");
+  CountableEarlyExitBlocks = std::move(ExactExitingBlocks);
+
+  return true;
+}
+
 bool LoopAccessInfo::canAnalyzeLoop() {
   // We need to have a loop header.
   LLVM_DEBUG(dbgs() << "LAA: Found a loop in "
@@ -2317,10 +2444,12 @@ bool LoopAccessInfo::canAnalyzeLoop() {
   // ScalarEvolution needs to be able to find the exit count.
   const SCEV *ExitCount = PSE->getBackedgeTakenCount();
   if (isa<SCEVCouldNotCompute>(ExitCount)) {
-    recordAnalysis("CantComputeNumberOfIterations")
-        << "could not determine number of loop iterations";
     LLVM_DEBUG(dbgs() << "LAA: SCEV could not compute the loop exit count.\n");
-    return false;
+    if (!isAnalyzableEarlyExitLoop()) {
+      recordAnalysis("CantComputeNumberOfIterations")
+          << "could not determine number of loop iterations";
+      return false;
+    }
   }
 
   return true;
@@ -2352,6 +2481,9 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
       EnableMemAccessVersioning &&
       !TheLoop->getHeader()->getParent()->hasOptSize();
 
+  BasicBlock *LatchBB = TheLoop->getLoopLatch();
+  bool HasComplexWorkInEarlyExitLoop = false;
+
   // Traverse blocks in fixed RPOT order, regardless of their storage in the
   // loop info, as it may be arbitrary.
   LoopBlocksRPO RPOT(TheLoop);
@@ -2367,7 +2499,8 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
 
       // With both a non-vectorizable memory instruction and a convergent
       // operation, found in this loop, no reason to continue the search.
-      if (HasComplexMemInst && HasConvergentOp) {
+      if ((HasComplexMemInst && HasConvergentOp) ||
+          HasComplexWorkInEarlyExitLoop) {
         CanVecMem = false;
         return;
       }
@@ -2385,6 +2518,14 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
       // vectorize a loop if it contains known function calls that don't set
       // the flag. Therefore, it is safe to ignore this read from memory.
       auto *Call = dyn_cast<CallInst>(&I);
+      if (Call && SpeculativeEarlyExitingBB) {
+        recordAnalysis("CantVectorizeInstruction", Call)
+            << "cannot vectorize calls in early exit loop";
+        LLVM_DEBUG(dbgs() << "LAA: Found a call in early exit loop.\n");
+        HasComplexWorkInEarlyExitLoop = true;
+        continue;
+      }
+
       if (Call && getVectorIntrinsicIDForCall(Call, TLI))
         continue;
 
@@ -2412,6 +2553,13 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
           HasComplexMemInst = true;
           continue;
         }
+        if (SpeculativeEarlyExitingBB && BB == LatchBB) {
+          recordAnalysis("CantVectorizeInstruction", Call)
+              << "cannot vectorize loads after early exit block";
+          LLVM_DEBUG(dbgs() << "LAA: Found a load after early exit.\n");
+          HasComplexWorkInEarlyExitLoop = true;
+          continue;
+        }
         NumLoads++;
         Loads.push_back(Ld);
         DepChecker->addAccess(Ld);
@@ -2423,6 +2571,13 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
       // Save 'store' instructions. Abort if othe...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/88385