[llvm] [LV] Improve code in selectInterleaveCount (NFC) (PR #128002)

Thu Feb 27 03:13:27 PST 2025

================
@@ -4998,51 +4997,53 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
   }
 
   unsigned EstimatedVF = getEstimatedRuntimeVF(VF, VScaleForTuning);
-  unsigned KnownTC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
-  if (KnownTC > 0) {
-    // At least one iteration must be scalar when this constraint holds. So the
-    // maximum available iterations for interleaving is one less.
-    unsigned AvailableTC =
-        requiresScalarEpilogue(VF.isVector()) ? KnownTC - 1 : KnownTC;
-
-    // If trip count is known we select between two prospective ICs, where
-    // 1) the aggressive IC is capped by the trip count divided by VF
-    // 2) the conservative IC is capped by the trip count divided by (VF * 2)
-    // The final IC is selected in a way that the epilogue loop trip count is
-    // minimized while maximizing the IC itself, so that we either run the
-    // vector loop at least once if it generates a small epilogue loop, or else
-    // we run the vector loop at least twice.
-
-    unsigned InterleaveCountUB = bit_floor(
-        std::max(1u, std::min(AvailableTC / EstimatedVF, MaxInterleaveCount)));
-    unsigned InterleaveCountLB = bit_floor(std::max(
-        1u, std::min(AvailableTC / (EstimatedVF * 2), MaxInterleaveCount)));
-    MaxInterleaveCount = InterleaveCountLB;
-
-    if (InterleaveCountUB != InterleaveCountLB) {
-      unsigned TailTripCountUB =
-          (AvailableTC % (EstimatedVF * InterleaveCountUB));
-      unsigned TailTripCountLB =
-          (AvailableTC % (EstimatedVF * InterleaveCountLB));
-      // If both produce same scalar tail, maximize the IC to do the same work
-      // in fewer vector loop iterations
-      if (TailTripCountUB == TailTripCountLB)
-        MaxInterleaveCount = InterleaveCountUB;
-    }
-  } else if (BestKnownTC && *BestKnownTC > 0) {
+
+  // Try to get the exact trip count, or an estimate based on profiling data or
+  // ConstantMax from PSE, failing that.
+  if (auto BestKnownTC = getSmallBestKnownTC(PSE, TheLoop)) {
     // At least one iteration must be scalar when this constraint holds. So the
     // maximum available iterations for interleaving is one less.
     unsigned AvailableTC = requiresScalarEpilogue(VF.isVector())
                                ? (*BestKnownTC) - 1
----------------
artagnon wrote:

How can `BestKnownTC` be 0?

```cpp
static std::optional<unsigned>
getSmallBestKnownTC(PredicatedScalarEvolution &PSE, Loop *L,
                    bool CanUseConstantMax = true) {
  // Check if exact trip count is known.
  if (unsigned ExpectedTC = PSE.getSE()->getSmallConstantTripCount(L))
    return ExpectedTC;

  // Check if there is an expected trip count available from profile data.
  if (LoopVectorizeWithBlockFrequency)
    if (auto EstimatedTC = getLoopEstimatedTripCount(L))
      return *EstimatedTC;

  if (!CanUseConstantMax)
    return std::nullopt;

  // Check if upper bound estimate is known.
  if (unsigned ExpectedTC = PSE.getSmallConstantMaxTripCount())
    return ExpectedTC;

  return std::nullopt;
}
```

In two cases, we have an `if` on `unsigned`, which will fail for 0 values, and in the third case, we have a std::optional of `ExitCount + 1` being returned, which cannot be 0.

```cpp
static std::optional<uint64_t> getEstimatedTripCount(BranchInst *ExitingBranch,
                                                     Loop *L,
                                                     uint64_t &OrigExitWeight) {
  // To estimate the number of times the loop body was executed, we want to
  // know the number of times the backedge was taken, vs. the number of times
  // we exited the loop.
  uint64_t LoopWeight, ExitWeight;
  if (!extractBranchWeights(*ExitingBranch, LoopWeight, ExitWeight))
    return std::nullopt;

  if (L->contains(ExitingBranch->getSuccessor(1)))
    std::swap(LoopWeight, ExitWeight);

  if (!ExitWeight)
    // Don't have a way to return predicated infinite
    return std::nullopt;

  OrigExitWeight = ExitWeight;

  // Estimated exit count is a ratio of the loop weight by the weight of the
  // edge exiting the loop, rounded to nearest.
  uint64_t ExitCount = llvm::divideNearest(LoopWeight, ExitWeight);
  // Estimated trip count is one plus estimated exit count.
  return ExitCount + 1;
}
```

In conclusion, I think the API `getSmallBestKnownTC` returns either a valid trip count (which cannot be 0), or `std::nullopt`. This is different from `getSmallConstantTripCount` which return 0 to communicate an invalid trip count.

https://github.com/llvm/llvm-project/pull/128002