[llvm-branch-commits] [llvm] [LoopUnroll] Fix freqs for unconditional latches: N<=2 (PR #179520)

Joel E. Denny via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Tue Mar 3 19:52:31 PST 2026


https://github.com/jdenny-ornl updated https://github.com/llvm/llvm-project/pull/179520

>From 7297d48741209f6bb47984c732af435049050383 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Tue, 3 Feb 2026 13:29:13 -0500
Subject: [PATCH 1/2] [LoopUnroll] Fix block frequencies for newly
 unconditional latches

As another step in issue #135812, this patch fixes block frequencies
when LoopUnroll converts a conditional latch in an unrolled loop
iteration to unconditional.  It thus includes complete loop unrolling
(the conditional backedge becomes an unconditional loop exit), which
might be applied to the original loop or to its remainder loop.

As explained in detail in the header comments on the
fixProbContradiction function that this patch introduces, these
conversions mean LoopUnroll has proven that the original uniform latch
probability is incorrect for the original loop iterations associated
with the converted latches.  However, LoopUnroll often is able to
perform these corrections for only some iterations, leaving other
iterations with the original latch probability, and thus corrupting
the aggregate effect on the total frequency of the original loop body.

This patch ensures that the total frequency of the original loop body,
summed across all its occurrences in the unrolled loop after the
aforementioned conversions, is the same as in the original loop.
Unlike other patches in this series, this patch cannot derive the
required latch probabilities directly from the original uniform latch
probability because it has been proven incorrect for some original
loop iterations.  Instead, this patch implements the following
strategies to compute probabilities for the remaining N conditional
latches in the unrolled loop:

- A. If N <= 2, use a simple formula to compute a single uniform
  probability across those latches.
- B. Otherwise, if `-unroll-uniform-weights` (a new option) is not
  specified, apply the original loop's probability to all N latches
  and then, as needed, adjust as few of them as possible.
- C. Otherwise, bisect the range [0,1] to find a single uniform
  probability across all N latches.

An issue with C is that it could impact compiler performance, so this
patch makes it opt-in.  Its appeal over B is that it treats all
latches the same given that we have no evidence showing that any latch
should have a higher or lower probability than any other.  A has
neither problem, but I do not know how to apply it for N > 2.  More
experience or feedback from others might determine that some
strategies are not worthwhile to maintain.

This patch does not consider the presence of non-latch loop exits, and
I do not have a solid plan for that case.  See fixme comments this
patch introduces.
---
 llvm/include/llvm/Support/BranchProbability.h |    1 +
 llvm/lib/Transforms/Utils/LoopUnroll.cpp      |  465 ++++++-
 .../Transforms/Utils/LoopUnrollRuntime.cpp    |    4 +-
 .../branch-weights-freq/unroll-complete.ll    | 1121 +++++++++++++++++
 .../branch-weights-freq/unroll-epilog.ll      |  340 ++++-
 .../unroll-partial-unconditional-latch.ll     |  380 ++++++
 .../branch-weights-freq/unroll-partial.ll     |    3 +-
 .../LoopUnroll/loop-probability-one.ll        |  201 +--
 8 files changed, 2372 insertions(+), 143 deletions(-)
 create mode 100644 llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
 create mode 100644 llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll

diff --git a/llvm/include/llvm/Support/BranchProbability.h b/llvm/include/llvm/Support/BranchProbability.h
index 0b0d4343a9fcb..2f2224891ab30 100644
--- a/llvm/include/llvm/Support/BranchProbability.h
+++ b/llvm/include/llvm/Support/BranchProbability.h
@@ -46,6 +46,7 @@ class BranchProbability {
   LLVM_ABI BranchProbability(uint32_t Numerator, uint32_t Denominator);
 
   bool isZero() const { return N == 0; }
+  bool isOne() const { return N == D; }
   bool isUnknown() const { return N == UnknownN; }
 
   static BranchProbability getZero() { return BranchProbability(0); }
diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index d9422afe5e82a..b33d5ce770553 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -65,6 +65,7 @@
 #include "llvm/Transforms/Utils/UnrollLoop.h"
 #include "llvm/Transforms/Utils/ValueMapper.h"
 #include <assert.h>
+#include <cmath>
 #include <numeric>
 #include <vector>
 
@@ -88,6 +89,11 @@ UnrollRuntimeEpilog("unroll-runtime-epilog", cl::init(false), cl::Hidden,
                     cl::desc("Allow runtime unrolled loops to be unrolled "
                              "with epilog instead of prolog."));
 
+static cl::opt<bool> UnrollUniformWeights(
+    "unroll-uniform-weights", cl::init(false), cl::Hidden,
+    cl::desc("If new branch weights must be found, work harder to keep them "
+             "uniform."));
+
 static cl::opt<bool>
 UnrollVerifyDomtree("unroll-verify-domtree", cl::Hidden,
                     cl::desc("Verify domtree after unrolling"),
@@ -438,6 +444,407 @@ static bool canHaveUnrollRemainder(const Loop *L) {
   return true;
 }
 
+// If LoopUnroll has proven OriginalLoopProb is incorrect for some iterations
+// of the original loop, adjust latch probabilities in the unrolled loop to
+// maintain the original total frequency of the original loop body.
+//
+// OriginalLoopProb is practical but imprecise
+// -------------------------------------------
+//
+// The latch branch weights that LLVM originally adds to a loop encode one latch
+// probability, OriginalLoopProb, applied uniformly across the loop's infinite
+// set of theoretically possible iterations.  While this uniform latch
+// probability serves as a practical statistic summarizing the trip counts
+// observed during profiling, it is imprecise.  Specifically, unless it is zero,
+// it is impossible for it to be the actual probability observed at every
+// individual iteration.  To see why, consider that the only way to actually
+// observe at run time that the latch probability remains non-zero is to profile
+// at least one loop execution that has an infinite number of iterations.  I do
+// not know how to profile an infinite number of loop iterations, and most loops
+// I work with are always finite.
+//
+// LoopUnroll proves OriginalLoopProb is incorrect
+// ------------------------------------------------
+//
+// LoopUnroll reorganizes the original loop so that loop iterations are no
+// longer all implemented by the same code, and then it analyzes some of those
+// loop iteration implementations independently of others.  In particular, it
+// converts some of their conditional latches to unconditional.  That is, by
+// examining code structure without any profile data, LoopUnroll proves that the
+// actual latch probability at the end of such an iteration is either 1 or 0.
+// When an individual iteration's actual latch probability is 1 or 0, that means
+// it always behaves the same, so it is impossible to observe it as having any
+// other probability.  The original uniform latch probability is rarely 1 or 0
+// because, when applied to all possible iterations, that would yield an
+// estimated trip count of infinity or 1, respectively.
+//
+// Thus, the new probabilities of 1 or 0 are proven corrections to
+// OriginalLoopProb for individual iterations in the original loop.  However,
+// LoopUnroll often is able to perform these corrections for only some
+// iterations, leaving other iterations with OriginalLoopProb, and thus
+// corrupting the aggregate effect on the total frequency of the original loop
+// body.
+//
+// Adjusting latch probabilities
+// -----------------------------
+//
+// This function ensures that the total frequency of the original loop body,
+// summed across all its occurrences in the unrolled loop after the
+// aforementioned latch conversions, is the same as in the original loop.  To do
+// so, it adjusts probabilities on the remaining conditional latches.  However,
+// it cannot derive the new probabilities directly from the original uniform
+// latch probability because the latter has been proven incorrect for some
+// original loop iterations.
+//
+// There are often many sets of latch probabilities that can produce the
+// original total loop body frequency.  If there are many remaining conditional
+// latches and !UnrollUniformWeights, this function just quickly hacks a few of
+// their probabilities to restore the original total loop body frequency.
+// Otherwise, it tries harder to determine less arbitrary probabilities.
+static void fixProbContradiction(UnrollLoopOptions ULO,
+                                 BranchProbability OriginalLoopProb,
+                                 bool CompletelyUnroll,
+                                 std::vector<unsigned> &IterCounts,
+                                 const std::vector<BasicBlock *> &CondLatches,
+                                 std::vector<BasicBlock *> &CondLatchNexts) {
+  // Runtime unrolling is handled later in LoopUnroll not here.
+  //
+  // There are two scenarios in which LoopUnroll sets ProbUpdateRequired to true
+  // because it needs to update probabilities that were originally
+  // OriginalLoopProb, but only in one scenario has LoopUnroll proven
+  // OriginalLoopProb incorrect for iterations within the original loop:
+  // - If ULO.Runtime, LoopUnroll adds new guards that enforce new reaching
+  //   conditions for new loop iteration implementations (e.g., one unrolled
+  //   loop iteration executes only if at least ULO.Count original loop
+  //   iterations remain).  Those reaching conditions dictate how conditional
+  //   latches can be converted to unconditional (e.g., within an unrolled loop
+  //   iteration, there is no need to recheck the number of remaining original
+  //   loop iterations).  None of this reorganization alters the set of possible
+  //   original loop iteration counts or proves OriginalLoopProb incorrect for
+  //   any of the original loop iterations.  Thus, LoopUnroll derives
+  //   probabilities for the new guards and latches directly from
+  //   OriginalLoopProb based on the probabilities that their reaching
+  //   conditions would occur in the original loop.  Doing so maintains the
+  //   total frequency of the original loop body.
+  // - If !ULO.Runtime, LoopUnroll initially adds new loop iteration
+  //   implementations, which have the same latch probabilities as in the
+  //   original loop because there are no new guards that change their reaching
+  //   conditions.  Sometimes, LoopUnroll is then done, and so does not set
+  //   ProbUpdateRequired to true.  Other times, LoopUnroll then proves that
+  //   some latches are unconditional, directly contradicting OriginalLoopProb
+  //   for the corresponding original loop iterations.  That reduces the set of
+  //   possible original loop iteration counts, possibly producing a finite set
+  //   if it manages to eliminate the backedge.  LoopUnroll has to choose a new
+  //   set of latch probabilities that produce the same total loop body
+  //   frequency.
+  //
+  // This function addresses the second scenario only.
+  if (ULO.Runtime)
+    return;
+
+  // If CondLatches.empty(), there are no latch branches with probabilities we
+  // can adjust.  That should mean that the actual trip count is always exactly
+  // the number of remaining unrolled iterations, and so OriginalLoopProb should
+  // have yielded that trip count as the original loop body frequency.  Of
+  // course, OriginalLoopProb could be based on bad profile data, but there is
+  // nothing we can do about that here.
+  if (CondLatches.empty())
+    return;
+
+  // If the original latch probability is 1, the original frequency is infinity.
+  // Leaving all remaining probabilities set to 1 might or might not get us
+  // there (e.g., a completely unrolled loop cannot be infinite), but it is the
+  // closest we can come.
+  assert(!OriginalLoopProb.isUnknown() &&
+         "Expected to have loop probability to fix");
+  if (OriginalLoopProb.isOne())
+    return;
+
+  // FreqDesired is the frequency implied by the original loop probability.
+  double FreqDesired = 1 / (1 - OriginalLoopProb.toDouble());
+
+  // Get the probability at CondLatches[I].
+  auto GetProb = [&](unsigned I) {
+    BranchInst *B = cast<BranchInst>(CondLatches[I]->getTerminator());
+    bool FirstTargetIsNext = B->getSuccessor(0) == CondLatchNexts[I];
+    return getBranchProbability(B, FirstTargetIsNext).toDouble();
+  };
+
+  // Set the probability at CondLatches[I] to Prob.
+  auto SetProb = [&](unsigned I, double Prob) {
+    BranchInst *B = cast<BranchInst>(CondLatches[I]->getTerminator());
+    bool FirstTargetIsNext = B->getSuccessor(0) == CondLatchNexts[I];
+    bool Success = setBranchProbability(
+        B, BranchProbability::getBranchProbability(Prob), FirstTargetIsNext);
+    assert(Success && "Expected to be able to set branch probability");
+  };
+
+  // Set all probabilities in CondLatches to Prob.
+  auto SetAllProbs = [&](double Prob) {
+    for (unsigned I = 0, E = CondLatches.size(); I < E; ++I)
+      SetProb(I, Prob);
+  };
+
+  // If UnrollUniformWeights or n <= 2, we choose the simplest probability model
+  // we can think of: every remaining conditional branch instruction has the
+  // same probability, Prob, of continuing to the next iteration.  This model
+  // has several helpful properties:
+  // - There is only one search parameter, Prob.
+  // - We have no reason to think one latch branch's probability should be
+  //   higher or lower than another, and so this model makes them all the same.
+  //   In the worst cases, we thus avoid setting just some probabilities to 0 or
+  //   1, which can unrealistically make some code appear unreachable.  There
+  //   are cases where they *all* must become 0 or 1 to achieve the total
+  //   frequency of original loop body, and our model does permit that.
+  // - The frequency, FreqOne, of the original loop body in a single iteration
+  //   of the unrolled loop is computed by a simple polynomial, where p=Prob,
+  //   n=CondLatches.size(), and c_i=IterCounts[i]:
+  //
+  //     FreqOne = Sum(i=0..n)(c_i * p^i)
+  //
+  // - If the backedge has been eliminated:
+  //   - FreqOne is the total frequency of the original loop body in the
+  //     unrolled loop.
+  //   - If Prob == 1, the total frequency of the original loop body is exactly
+  //     the number of remaining loop iterations, as expected because every
+  //     remaining loop iteration always then executes.
+  // - If the backedge remains:
+  //   - Sum(i=0..inf)(FreqOne * p^(n*i)) = FreqOne / (1 - p^n) is the total
+  //     frequency of the original loop body in the unrolled loop, regardless of
+  //     whether the backedge is conditional or unconditional.
+  //   - As Prob approaches 1, the total frequency of the original loop body
+  //     approaches infinity, as expected because the loop approaches never
+  //     exiting.
+  // - For n <= 2, we can use simple formulas to solve the above polynomial
+  //   equation exactly for p without performing a search.   For n == 2, we use
+  //   ComputeProbForQuadratic below.  For n == 1, we use ComputeProb below.
+  // - For n > 2, evaluating each point in the search space, using ComputeFreq
+  //   below, requires about as few instructions as we could hope for.  That is,
+  //   the probability is constant across the conditional branches, so the only
+  //   computation is across conditional branches and any backedge, as required
+  //   for any model for Prob.
+  // - Prob == 1 produces the maximum possible total frequency for the original
+  //   loop body, as described above.  Prob == 0 produces the minimum, 0.
+  //   Increasing or decreasing Prob monotonically increases or decreases the
+  //   frequency, respectively.  Thus, for every possible frequency, there
+  //   exists some Prob that can produce it, and we can easily use bisection to
+  //   search the problem space.
+
+  // When iterating for a solution, we stop early if we find probabilities
+  // that produce a Freq whose difference from FreqDesired is small
+  // (FreqPrec).  Otherwise, we expect to compute a solution at least that
+  // accurate (but surely far more accurate).
+  const double FreqPrec = 1e-6;
+
+  // Compute the new frequency produced by using Prob throughout CondLatches.
+  auto ComputeFreq = [&](double Prob) {
+    double ProbReaching = 1;        // p^0
+    double FreqOne = IterCounts[0]; // c_0*p^0
+    for (unsigned I = 0, E = CondLatches.size(); I < E; ++I) {
+      ProbReaching *= Prob;                        // p^(I+1)
+      FreqOne += IterCounts[I + 1] * ProbReaching; // c_(I+1)*p^(I+1)
+    }
+    double ProbReachingBackedge = CompletelyUnroll ? 0 : ProbReaching;
+    assert(FreqOne > 0 && "Expected at least one iteration before first latch");
+    if (ProbReachingBackedge == 1)
+      return std::numeric_limits<double>::infinity();
+    return FreqOne / (1 - ProbReachingBackedge);
+  };
+
+  // Compute the probability that, used throughout CondLatches where
+  // CondLatches.size() == 2, gets as close as possible to FreqDesired.
+  auto ComputeProbForQuadratic = [&]() {
+    // The polynomial is quadratic, so just solve it.
+    double A = IterCounts[2] + (CompletelyUnroll ? 0 : FreqDesired);
+    double B = IterCounts[1];
+    double C = IterCounts[0] - FreqDesired;
+    assert(A > 0 && "Expected iterations after last conditional latch");
+    double Prob = (-B + sqrt(B * B - 4 * A * C)) / (2 * A);
+    // If it computes an invalid Prob, FreqDesired is impossibly low or high.
+    // Otherwise, Prob should produce nearly FreqDesired.
+    assert((Prob < 0 || Prob > 1 ||
+            fabs(ComputeFreq(Prob) - FreqDesired) < FreqPrec) &&
+           "Expected accurate frequency when quadratic case is possible");
+    Prob = std::max(Prob, 0.);
+    Prob = std::min(Prob, 1.);
+    return Prob;
+  };
+
+  // Compute the probability required at CondLatches[ComputeIdx] to get as close
+  // as possible to FreqDesired without replacing probabilities elsewhere in
+  // CondLatches.  Return {Prob, Freq} where 0 <= Prob <= 1 and Freq is the new
+  // frequency.
+  auto ComputeProb = [&](unsigned ComputeIdx) -> std::pair<double, double> {
+    assert(ComputeIdx < CondLatches.size());
+
+    // Accumulate the frequency from before ComputeIdx into FreqBeforeCompute,
+    // and accumulate the rest in Freq without yet multiplying the latter by any
+    // probability for ComputeIdx (i.e., treat it as 1 for now).
+    double ProbReaching = 1;     // p^0
+    double Freq = IterCounts[0]; // c_0*p^0
+    double FreqBeforeCompute;
+    for (unsigned I = 0, E = CondLatches.size(); I < E; ++I) {
+      // Get the branch probability for CondLatches[I].
+      double Prob;
+      if (I == ComputeIdx) {
+        FreqBeforeCompute = Freq;
+        Freq = 0;
+        Prob = 1;
+      } else {
+        Prob = GetProb(I);
+      }
+      ProbReaching *= Prob;                     // p^(I+1)
+      Freq += IterCounts[I + 1] * ProbReaching; // c_(I+1)*p^(I+1)
+    }
+
+    // Compute the required probability, and limit it to a valid probability (0
+    // <= p <= 1).  See the Freq formula below for how to derive the ProbCompute
+    // formula.
+    double ProbReachingBackedge = CompletelyUnroll ? 0 : ProbReaching;
+    double ProbComputeNumerator = FreqDesired - FreqBeforeCompute;
+    double ProbComputeDenominator = Freq + FreqDesired * ProbReachingBackedge;
+    double ProbCompute;
+    if (ProbComputeNumerator <= 0) {
+      // FreqBeforeCompute has already reached or surpassed FreqDesired, so add
+      // no more frequency.  It is possible that ProbComputeDenominator == 0
+      // here because some latch probability (maybe the original) was set to
+      // zero, so this check avoids setting ProbCompute=1 (in the else if below)
+      // and division by zero where the numerator <= 0 (in the else below).
+      ProbCompute = 0;
+    } else if (ProbComputeDenominator == 0) {
+      // Analytically, this case seems impossible.  It would occur if either:
+      // - Both Freq and FreqDesired are zero.  But the latter would cause
+      //   ProbComputeNumerator < 0, which we catch above, and FreqDesired
+      //   should always be >= 1 anyway.
+      // - There are no iterations after CondLatches[ComputeIdx], not even via
+      //   a backedge, so that both Freq and ProbReachingBackedge are zero.
+      //   But iterations should exist after even the last conditional latch.
+      // - Some latch probability (maybe the original) was set to zero so that
+      //   both Freq and ProbReachingBackedge are zero.  But that should not
+      //   have happened because, according to the above ProbComputeNumerator
+      //   check, we have not yet reached FreqDesired (which, if the original
+      //   latch probability is zero, is just 1 and thus always reached or
+      //   surpassed).
+      //
+      // Numerically, perhaps this case is possible.  We interpret it to mean we
+      // need more frequency (ProbComputeNumerator > 0) but have no way to get
+      // any (ProbComputeDenominator is analytically too small to distinguish it
+      // from 0 in floating point), suggesting infinite probability is needed,
+      // but 1 is the maximum valid probability and thus the best we can do.
+      //
+      // TODO: Cover this case in the test suite if you can.
+      ProbCompute = 1;
+    } else {
+      ProbCompute = ProbComputeNumerator / ProbComputeDenominator;
+      ProbCompute = std::max(ProbCompute, 0.);
+      ProbCompute = std::min(ProbCompute, 1.);
+    }
+
+    // Compute the resulting total frequency.
+    if (ProbReachingBackedge * ProbCompute == 1) {
+      // Analytically, this case seems impossible.  It requires that there is a
+      // backedge and that FreqDesired == infinity so that every conditional
+      // latch's probability had to be set to 1.  But FreqDesired == infinity
+      // means OriginalLoopProb.isOne(), which we guarded against earlier.
+      //
+      // Numerically, perhaps this case is possible.  We interpret it to mean
+      // that analytically the probability has to be so near 1 that, in floating
+      // point, the frequency is computed as infinite.
+      //
+      // TODO: Cover this case in the test suite if you can.
+      Freq = std::numeric_limits<double>::infinity();
+    } else {
+      assert(FreqBeforeCompute > 0 &&
+             "Expected at least one iteration before first latch");
+      // In this equation, if we replace the left-hand side with FreqDesired and
+      // then solve for ProbCompute, we get the ProbCompute formula above.
+      Freq = (FreqBeforeCompute + Freq * ProbCompute) /
+             (1 - ProbReachingBackedge * ProbCompute);
+    }
+    return {ProbCompute, Freq};
+  };
+
+  // Determine and set branch weights.
+  //
+  // Prob < 0 and Prob > 1 cannot be represented as branch weights.  We might
+  // compute such a Prob if FreqDesired is impossible (e.g., due to bad profile
+  // data) for the maximum trip count we have determined when completely
+  // unrolling.  In that case, so just go with whichever is closest.
+  if (CondLatches.size() == 2) {
+    // The polynomial is quadratic, so just solve it.
+    SetAllProbs(ComputeProbForQuadratic());
+  } else if (CondLatches.size() == 1 || !UnrollUniformWeights) {
+    // Either:
+    // - There's just one conditional latch, so just compute the probability
+    //   it requires to produce the original total frequency.
+    // - The polynomial is too complex for a simple formula and the quick and
+    //   dirty fix has been selected.  Adjust probabilities starting from the
+    //   first latch, which has the most influence on the total frequency, so
+    //   starting there should minimize the number of latches that have to be
+    //   visited.  We do have to iterate because the first latch alone might
+    //   not be enough.  For example, we might need to set all probabilities
+    //   to 1 if the frequency is the unroll factor.
+    for (unsigned I = 0; I != CondLatches.size(); ++I) {
+      double Prob, Freq;
+      std::tie(Prob, Freq) = ComputeProb(I);
+      SetProb(I, Prob);
+      if (fabs(Freq - FreqDesired) < FreqPrec)
+        break;
+    }
+  } else {
+    // The polynomial is more complex, and uniform branch weights have been
+    // selected, so bisect.
+    double ProbMin, ProbMax, ProbPrev;
+    auto TryProb = [&](double Prob) {
+      ProbPrev = Prob;
+      double FreqDelta = ComputeFreq(Prob) - FreqDesired;
+      if (fabs(FreqDelta) < FreqPrec)
+        return 0;
+      if (FreqDelta < 0) {
+        ProbMin = Prob;
+        return -1;
+      }
+      ProbMax = Prob;
+      return 1;
+    };
+    // If Prob == 0 is too small and Prob == 1 is too large, bisect between
+    // them.  To place a hard upper limit on the search time, stop bisecting
+    // when Prob stops changing (ProbDelta) by much (ProbPrec).
+    if (TryProb(0.) < 0 && TryProb(1.) > 0) {
+      const double ProbPrec = 1e-12;
+      double Prob, ProbDelta;
+      do {
+        Prob = (ProbMin + ProbMax) / 2;
+        ProbDelta = Prob - ProbPrev;
+      } while (TryProb(Prob) != 0 && fabs(ProbDelta) > ProbPrec);
+    }
+    SetAllProbs(ProbPrev);
+  }
+  // FIXME: We have not considered non-latch loop exits:
+  // - Their original probabilities are not considered in our calculation of
+  //   FreqDesired.
+  // - Their probabilities are not considered in our probability model used to
+  //   determine new probabilities for remaining conditional branches.
+  // - If they are conditional and LoopUnroll converts them to unconditional,
+  //   LoopUnroll has proven their original probabilities are incorrect for some
+  //   original loop iterations, but that does not cause ProbUpdateRequired to
+  //   be set to true.
+  //
+  // To adjust FreqDesired and our probability model correctly for a non-latch
+  // loop exit, we would need to compute the original probability that the exit
+  // is reached from the loop header (in contrast, we currently assume that
+  // probability is 1 in the case of a latch exit) and the probability that the
+  // exit is taken if it is conditional (use the branch's old or new weights for
+  // FreqDesired or the probability model, respectively).  Does computing the
+  // reaching probability require a CFG traversal, or is there some existing
+  // library that can do it?  Prior discussions suggest some such libraries are
+  // difficult to use within LoopUnroll:
+  // <https://github.com/llvm/llvm-project/pull/164799#issuecomment-3438681519>.
+  // For now, we just let our corrected probabilities be less accurate in that
+  // scenario.  Alternatively, we could refuse to correct probabilities at all
+  // in that scenario, but that seems worse.
+}
+
 /// Unroll the given loop by Count. The loop must be in LCSSA form.  Unrolling
 /// can only fail when the loop's latch block is not terminated by a conditional
 /// branch instruction. However, if the trip count (and multiple) are not known,
@@ -1002,7 +1409,9 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
   };
 
   // Fold branches for iterations where we know that they will exit or not
-  // exit.
+  // exit.  In the case of an interation's latch, if we thus find
+  // *OriginalLoopProb is incorrect, set ProbUpdateRequired to true.
+  bool ProbUpdateRequired = false;
   for (auto &Pair : ExitInfos) {
     ExitInfo &Info = Pair.second;
     for (unsigned i = 0, e = Info.ExitingBlocks.size(); i != e; ++i) {
@@ -1027,6 +1436,14 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
         continue;
       }
 
+      // For a latch, record any OriginalLoopProb contradiction.
+      if (!OriginalLoopProb.isUnknown() && IsLatch) {
+        BranchProbability ActualProb = *KnownWillExit
+                                           ? BranchProbability::getZero()
+                                           : BranchProbability::getOne();
+        ProbUpdateRequired |= OriginalLoopProb != ActualProb;
+      }
+
       SetDest(Info.ExitingBlocks[i], *KnownWillExit, Info.ExitOnTrue);
     }
   }
@@ -1062,14 +1479,39 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
     changeToUnreachable(Latches.back()->getTerminator(), PreserveLCSSA);
   }
 
+  // After merging adjacent blocks in Latches below:
+  // - CondLatches will list the blocks from Latches that are still terminated
+  //   with conditional branches.
+  // - For 1 <= I < CondLatches.size(), IterCounts[I] will store the number of
+  //   the original loop iterations through which control flows from
+  //   CondLatches[I-1] to CondLatches[I].
+  // - For I == 0 or I == CondLatches.size(), IterCounts[I] will store the
+  //   number of the original loop iterations through which control can flow
+  //   before CondLatches.front() or after CondLatches.back(), respectively,
+  //   without taking the unrolled loop's backedge, if any.
+  // - CondLatchNexts[I] will store the CondLatches[I] branch target for the
+  //   next of the original loop's iterations (as opposed to the exit target).
+  assert(ULO.Count == Latches.size() &&
+         "Expected one latch block per unrolled iteration");
+  std::vector<unsigned> IterCounts(1, 0);
+  std::vector<BasicBlock *> CondLatches;
+  std::vector<BasicBlock *> CondLatchNexts;
+  IterCounts.reserve(Latches.size() + 1);
+  CondLatches.reserve(Latches.size());
+  CondLatchNexts.reserve(Latches.size());
+
   // Merge adjacent basic blocks, if possible.
-  for (BasicBlock *Latch : Latches) {
+  for (unsigned I = 0, E = Latches.size(); I < E; ++I) {
+    ++IterCounts.back();
+    BasicBlock *Latch = Latches[I];
     BranchInst *Term = dyn_cast<BranchInst>(Latch->getTerminator());
     assert((Term ||
             (CompletelyUnroll && !LatchIsExiting && Latch == Latches.back())) &&
            "Need a branch as terminator, except when fully unrolling with "
            "unconditional latch");
-    if (Term && Term->isUnconditional()) {
+    if (!Term)
+      continue;
+    if (Term->isUnconditional()) {
       BasicBlock *Dest = Term->getSuccessor(0);
       BasicBlock *Fold = Dest->getUniquePredecessor();
       if (MergeBlockIntoPredecessor(Dest, /*DTU=*/DTUToUse, LI,
@@ -1080,9 +1522,19 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
         llvm::replace(Latches, Dest, Fold);
         llvm::erase(UnrolledLoopBlocks, Dest);
       }
+    } else {
+      IterCounts.push_back(0);
+      CondLatches.push_back(Latch);
+      CondLatchNexts.push_back(Headers[(I + 1) % E]);
     }
   }
 
+  // Fix probabilities we contradicted above.
+  if (ProbUpdateRequired) {
+    fixProbContradiction(ULO, OriginalLoopProb, CompletelyUnroll, IterCounts,
+                         CondLatches, CondLatchNexts);
+  }
+
   // If there are partial reductions, create code in the exit block to compute
   // the final result and update users of the final result.
   if (!PartialReductions.empty()) {
@@ -1144,8 +1596,7 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
     //   unrolled loop guard it creates.  The branch weights for the unrolled
     //   loop latch are adjusted below.  FIXME: Handle prologue loops.
     // - Otherwise, if unrolled loop iteration latches become unconditional,
-    //   branch weights are adjusted above.  FIXME: Actually handle such
-    //   unconditional latches.
+    //   branch weights are adjusted by the fixProbContradiction call above.
     // - Otherwise, the original loop's branch weights are correct for the
     //   unrolled loop, so do not adjust them.
     // - In all cases, the unrolled loop's estimated trip count is set below.
@@ -1166,6 +1617,10 @@ llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
     // each unrolled iteration's latch within it, we store the new trip count as
     // separate metadata.
     if (!OriginalLoopProb.isUnknown() && ULO.Runtime && EpilogProfitability) {
+      assert((CondLatches.size() == 1 &&
+              (ProbUpdateRequired || OriginalLoopProb.isOne())) &&
+             "Expected ULO.Runtime to give unrolled loop one conditional latch,"
+             "the backedge, requiring a probability update unless infinite");
       // Where p is always the probability of executing at least 1 more
       // iteration, the probability for at least n more iterations is p^n.
       setLoopProbability(L, OriginalLoopProb.pow(ULO.Count));
diff --git a/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp b/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
index 0cfd4a59bb4e0..829c1b9d5959c 100644
--- a/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
@@ -220,8 +220,8 @@ probOfNextInRemainder(BranchProbability OriginalLoopProb, unsigned N) {
   // loop body might have unique blocks that execute a finite number of times
   // if, for example, the original loop body contains conditionals like i <
   // UnrollCount.
-  if (OriginalLoopProb == BranchProbability::getOne())
-    return BranchProbability::getOne();
+  if (OriginalLoopProb.isOne())
+    return OriginalLoopProb;
 
   // Each of these variables holds the original loop's probability that the
   // number of iterations it will execute is some m in the specified range.
diff --git a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
new file mode 100644
index 0000000000000..94d259b20bf84
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
@@ -0,0 +1,1121 @@
+; Test branch weight metadata, estimated trip count metadata, and block
+; frequencies after complete loop unrolling.  The final unrolled iteration
+; unconditionally exits (backedge removed), and other unrolled iterations'
+; latches might unconditionally continue.  Either contradicts the original
+; branch weights.
+;
+; (unroll-partial-unconditional-latch.ll tests partial unrolling cases,
+; including cases where the latch of any iteration, including the final, might
+; unconditionally continue.)
+;
+; For each case, we check:
+; - Iteration frequencies
+;   - When each is multiplied by the number of original loop bodies that execute
+;     within it, they should sum to almost exactly the original loop body
+;     frequency.
+;   - The only exception is an impossibly high or low original frequency (e.g.,
+;     due to bad profile data), for which there exist no new branch weights that
+;     can yield that frequency sum.  In those cases, we expect the maximum or
+;     minimum possible frequency.
+; - CFGs
+;   - We verify which branch weights go with which branches and that we did not
+;     overlook any other branch weights (no extra !prof or branch_weights).
+;   - We also check the number of original loop bodies (represented by a call to
+;     @f) that appear within each unrolled iteration.
+; - Branch weight metadata
+;   - Checking frequencies already checks whether the branch weights have the
+;     expected effect, but we also want to check the following.
+;   - We get uniform probabilities/weights (same !prof) across the unrolled
+;     iteration latches if either:
+;     - The number of unrolled iterations <= the original loop body frequency,
+;       and then probabilities are all 1 to *try* to reach that frequency.
+;     - The original loop body frequency is 1, and then probabilities are all 0
+;       because only the first iteration is expected to execute.
+;     - The number of remaining conditional latches is <= 2, either because the
+;       number of unrolled iterations is <= 3 or because enough of the unrolled
+;       iterations' latches become unconditional.  Either way, the
+;       implementation computes uniform branch weights by solving a linear or
+;       quadratic equation.
+;     - -unroll-uniform-weights.
+;   - Otherwise, the earliest branch weights (starting with !prof !0) are
+;     adjusted as needed to produce the original loop body frequency, and the
+;     rest are left as they were in the original loop.
+; - llvm.loop.estimated_trip_count:
+;   - There should be none because loops are completely unrolled.
+
+; ------------------------------------------------------------------------------
+; Define LIT substitutions.
+;
+; Before using the following lit substitutions, sed should be called to replace
+; these parameters in %s to produce %t.ll:
+; - @I_0@ is the starting value for the original loop's induction variable.
+; - @MIN@ and @MAX@ are the compile-time known minimum and maximum for the
+;   number of original loop iterations, regardless of @I_0 at .
+; - @W@ is the branch weight for the original loop's backedge.  That value plus
+;   1 is the original loop body frequency because the exit branch weight is 1.
+;
+; For verifying that the test code produces the original loop body frequency we
+; expect.
+; DEFINE: %{bf-fc} = opt %t.ll -S -passes='print<block-freq>' 2>&1 | \
+; DEFINE:   FileCheck %s -check-prefixes
+;
+; For checking the unrolled loop.
+; DEFINE: %{ur-bf} = opt %t.ll -S -passes='loop-unroll,print<block-freq>' 2>&1
+; DEFINE: %{fc} = FileCheck %s \
+; DEFINE:     -implicit-check-not='llvm.loop.estimated_trip_count' \
+; DEFINE:     -implicit-check-not='!prof' \
+; DEFINE:     -implicit-check-not='branch_weights' \
+; DEFINE:     -implicit-check-not='call void @f' -check-prefixes
+
+; ------------------------------------------------------------------------------
+; Check 1 max iteration:
+; - Unroll count of >=1 should always produce complete unrolling.
+; - That produces 0 unrolled iteration latches, so there are no branch weights
+;   to compute.  Thus, -unroll-uniform-weights has no effect.
+;
+; Original loop body frequency is 2 (loop weight 1), which is impossibly high.
+;
+;   RUN: sed -e s/@MAX@/1/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;   RUN: %{bf-fc} ORIG1210
+;   RUN: %{ur-bf} -unroll-count=1 -unroll-uniform-weights | %{fc} UR1210
+;   RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR1210
+;   RUN: %{ur-bf} -unroll-count=1 | %{fc} UR1210
+;   RUN: %{ur-bf} -unroll-count=2 | %{fc} UR1210
+;
+;   The new do.body is less than the old do.body, which is impossibly high.
+;   ORIG1210: - do.body: float = 2.0,
+;   UR1210:   - do.body: float = 1.0,
+;
+;   UR1210: call void @f
+;
+; Original loop body frequency is 1 (loop weight 0).
+;
+;   RUN: sed -e s/@MAX@/1/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;   RUN: %{bf-fc} ORIG1110
+;   RUN: %{ur-bf} -unroll-count=1 -unroll-uniform-weights | %{fc} UR1110
+;   RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR1110
+;   RUN: %{ur-bf} -unroll-count=1 | %{fc} UR1110
+;   RUN: %{ur-bf} -unroll-count=2 | %{fc} UR1110
+;
+;   The the new do.body equals the old do.body.
+;   ORIG1110: - do.body: float = 1.0,
+;   UR1110:   - do.body: float = 1.0,
+;
+;   UR1110: call void @f
+
+; ------------------------------------------------------------------------------
+; Check 2 max iterations:
+; - Unroll count of >=2 should always produce complete unrolling.
+; - That produces <=1 unrolled iteration latch, so the implementation can
+;   compute uniform weights by solving, at worst, a linear equation.  Thus,
+;   -unroll-uniform-weights has no effect.
+;
+; Original loop body frequency is 3 (loop weight 2), which is impossibly high.
+;
+;   First use a variable iteration count so that the sole non-final unrolled
+;   iteration's latch remains conditional.
+;
+;     RUN: sed -e s/@MAX@/2/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG2310
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2310
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2310
+;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2310
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2310
+;
+;     The sum of the new do.body* cannot reach the old do.body, which is
+;     impossibly high.
+;     ORIG2310: - do.body: float = 3.0,
+;     UR2310:   - do.body: float = 1.0,
+;     UR2310:   - do.body.1: float = 1.0,
+;
+;     The sole probability is maximized to try to reach the original frequency.
+;     UR2310: call void @f
+;     UR2310: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR2310: call void @f
+;     UR2310: br label %do.end
+;     UR2310: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;
+;   Now use a constant iteration count so that the sole non-final unrolled
+;   iteration's latch unconditionally continues.
+;
+;     RUN: sed -e s/@MAX@/2/ -e s/@W@/2/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG2320
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2320
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2320
+;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2320
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2320
+;
+;     The new do.body contains 2 of the original loop's iterations, so multiply
+;     it by 2, which is less than the old do.body, which is impossibly high.
+;     ORIG2320: - do.body: float = 3.0,
+;     UR2320:   - do.body: float = 1.0,
+;
+;     UR2320:     call void @f
+;     UR2320-NOT: br
+;     UR2320:     call void @f
+;     UR2320:     ret void
+;
+; Original loop body frequency is 2 (loop weight 1).
+;
+;   First use a variable iteration count so that the sole non-final unrolled
+;   iteration's latch remains conditional.
+;
+;     RUN: sed -e s/@MAX@/2/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG2210
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2210
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2210
+;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2210
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2210
+;
+;     The sum of the new do.body* is the old do.body.
+;     ORIG2210: - do.body: float = 2.0,
+;     UR2210:   - do.body: float = 1.0,
+;     UR2210:   - do.body.1: float = 1.0,
+;
+;     UR2210: call void @f
+;     UR2210: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR2210: call void @f
+;     UR2210: br label %do.end
+;     UR2210: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;
+;   Now use a constant iteration count so that the sole non-final unrolled
+;   iteration's latch unconditionally continues.
+;
+;     RUN: sed -e s/@MAX@/2/ -e s/@W@/1/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG2220
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2220
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2220
+;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2220
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2220
+;
+;     The new do.body contains 2 of the original loop's iterations, so multiply
+;     it by 2 to get the old do.body.
+;     ORIG2220: - do.body: float = 2.0,
+;     UR2220:   - do.body: float = 1.0,
+;
+;     UR2220:     call void @f
+;     UR2220-NOT: br
+;     UR2220:     call void @f
+;     UR2220:     ret void
+;
+; Original loop body frequency is 1 (loop weight 0).
+;
+;   First use a variable iteration count so that the sole non-final unrolled
+;   iteration's latch remains conditional.
+;
+;     RUN: sed -e s/@MAX@/2/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG2110
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2110
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2110
+;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2110
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2110
+;
+;     The sum of the new do.body* is approximately the old do.body.
+;     ORIG2110: - do.body: float = 1.0,
+;     UR2110:   - do.body: float = 1.0,
+;     UR2110:   - do.body.1: float = 0.0{{(0000[0-9]*)?}},
+;
+;     UR2110: call void @f
+;     UR2110: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR2110: call void @f
+;     UR2110: br label %do.end
+;     UR2110: !0 = !{!"branch_weights", i32 1, i32 0}
+;
+;   Now use a constant iteration count so that the sole non-final unrolled
+;   iteration's latch unconditionally continues.
+;
+;     RUN: sed -e s/@MAX@/2/ -e s/@W@/0/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG2120
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2120
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2120
+;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2120
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2120
+;
+;     The new do.body contains 2 of the original loop's iterations, so multiply
+;     it by 2, which is greater than the old do.body, which is impossibly low.
+;     ORIG2120: - do.body: float = 1.0,
+;     UR2120:   - do.body: float = 1.0,
+;
+;     UR2120:     call void @f
+;     UR2220-NOT: br
+;     UR2120:     call void @f
+;     UR2120:     ret void
+
+; ------------------------------------------------------------------------------
+; Check 3 max iterations:
+; - Unroll count of >=3 should always produce complete unrolling.
+; - That produces <=2 unrolled iteration latches, so the implementation can
+;   compute uniform weights solving, at worst, a quadratic equation.  Thus,
+;   -unroll-uniform-weights has no effect.
+;
+; Original loop body frequency is 4 (loop weight 3), which is impossibly high.
+;
+;   First use a variable iteration count so that all non-final unrolled
+;   iterations' latches remain conditional.
+;
+;     RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG3410
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3410
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3410
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3410
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3410
+;
+;     The sum of the new do.body* cannot reach the old do.body, which is
+;     impossibly high.
+;     ORIG3410: - do.body: float = 4.0,
+;     UR3410:   - do.body: float = 1.0,
+;     UR3410:   - do.body.1: float = 1.0,
+;     UR3410:   - do.body.2: float = 1.0,
+;
+;     The probabilities are maximized to try to reach the original frequency.
+;     UR3410: call void @f
+;     UR3410: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR3410: call void @f
+;     UR3410: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
+;     UR3410: call void @f
+;     UR3410: br label %do.end
+;     UR3410: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;
+;   Now use a constant iteration count so that all non-final unrolled
+;   iterations' latches unconditionally continue.
+;
+;     RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG3430
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3430
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3430
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3430
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3430
+;
+;     The new do.body contains 3 of the original loop's iterations, so multiply
+;     it by 3, which is less than the old do.body, which is impossibly high.
+;     ORIG3430: - do.body: float = 4.0,
+;     UR3430:   - do.body: float = 1.0,
+;
+;     UR3430:     call void @f
+;     UR3430-NOT: br
+;     UR3430:     call void @f
+;     UR3430-NOT: br
+;     UR3430:     call void @f
+;     UR3430:     ret void
+;
+;   Use a constant iteration count but now the loop upper bound computation can
+;   overflow.  When it does, the loop induction variable is greater than it
+;   immediately, so the initial unrolled iteration's latch remains conditional.
+;
+;     RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG343x
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR343x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR343x
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR343x
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR343x
+;
+;     The new do.body.1 contains 2 of the original loop's iterations, so
+;     multiply it by 2, and add the new do.body, but that sum is less than the
+;     old do.body, which is impossibly high.
+;     ORIG343x: - do.body: float = 4.0,
+;     UR343x:   - do.body: float = 1.0,
+;     UR343x:   - do.body.1: float = 1.0,
+;
+;     The sole probability is maximized to try to reach the original frequency.
+;     UR343x:     call void @f
+;     UR343x:     br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR343x:     call void @f
+;     UR343x-NOT: br
+;     UR343x:     call void @f
+;     UR343x:     ret void
+;     UR343x:     !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;
+; Original loop body frequency is 3 (loop weight 2).
+;
+;   First use a variable iteration count so that all non-final unrolled
+;   iterations' latches remain conditional.
+;
+;     RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG3310
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3310
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3310
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3310
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3310
+;
+;     The sum of the new do.body* is the old do.body.
+;     ORIG3310: - do.body: float = 3.0,
+;     UR3310:   - do.body: float = 1.0,
+;     UR3310:   - do.body.1: float = 1.0,
+;     UR3310:   - do.body.2: float = 1.0,
+;
+;     UR3310: call void @f
+;     UR3310: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR3310: call void @f
+;     UR3310: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
+;     UR3310: call void @f
+;     UR3310: br label %do.end
+;     UR3310: !0 = !{!"branch_weights", i32 1, i32 2147483647}
+;
+;   Now use a constant iteration count so that all non-final unrolled
+;   iterations' latches unconditionally continue.
+;
+;     RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG3330
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3330
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3330
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3330
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3330
+;
+;     The new do.body contains 3 of the original loop's iterations, so multiply
+;     it by 3 to get the old do.body.
+;     ORIG3330: - do.body: float = 3.0,
+;     UR3330:   - do.body: float = 1.0,
+;
+;     UR3330:     call void @f
+;     UR3330-NOT: br
+;     UR3330:     call void @f
+;     UR3330-NOT: br
+;     UR3330:     call void @f
+;     UR3330:     ret void
+;
+;   Use a constant iteration count but now the loop upper bound computation can
+;   overflow.  When it does, the loop induction variable is greater than it
+;   immediately, so the initial unrolled iteration's latch remains conditional.
+;
+;     RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG333x
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR333x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR333x
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR333x
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR333x
+;
+;     The new do.body.1 contains 2 of the original loop's iterations, so
+;     multiply it by 2, and add the new do.body to get the old do.body.
+;     ORIG333x: - do.body: float = 3.0,
+;     UR333x:   - do.body: float = 1.0,
+;     UR333x:   - do.body.1: float = 1.0,
+;
+;     UR333x:     call void @f
+;     UR333x: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR333x:     call void @f
+;     UR333x-NOT: br
+;     UR333x:     call void @f
+;     UR333x:     br label %do.end
+;     UR333x:     !0 = !{!"branch_weights", i32 1, i32 2147483647}
+;
+; Original loop body frequency is 2 (loop weight 1).  This is our first case
+; where new frequencies and probabilities are not all approximately 1 or 0.
+;
+;   First use a variable iteration count so that all non-final unrolled
+;   iterations' latches remain conditional.
+;
+;     RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG3210
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3210
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3210
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3210
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3210
+;
+;     The sum of the new do.body* is the old do.body.
+;     ORIG3210: - do.body: float = 2.0,
+;     UR3210:   - do.body: float = 1.0,
+;     UR3210:   - do.body.1: float = 0.61803,
+;     UR3210:   - do.body.2: float = 0.38197,
+;
+;     UR3210: call void @f
+;     UR3210: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR3210: call void @f
+;     UR3210: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
+;     UR3210: call void @f
+;     UR3210: br label %do.end
+;     UR3210: !0 = !{!"branch_weights", i32 820265763, i32 1327217885}
+;
+;   Now use a constant iteration count so that all non-final unrolled
+;   iterations' latches unconditionally continue.
+;
+;     RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG3230
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3230
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3230
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3230
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3230
+;
+;     The new do.body contains 3 of the original loop's iterations, so multiply
+;     it by 3, which is greater than the old do.body, which is impossibly low.
+;     ORIG3230: - do.body: float = 2.0,
+;     UR3230:   - do.body: float = 1.0,
+;
+;     UR3230:     call void @f
+;     UR3230-NOT: br
+;     UR3230:     call void @f
+;     UR3230-NOT: br
+;     UR3230:     call void @f
+;     UR3230:     ret void
+;
+;   Use a constant iteration count but now the loop upper bound computation can
+;   overflow.  When it does, the loop induction variable is greater than it
+;   immediately, so the initial unrolled iteration's latch remains conditional.
+;
+;     RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG323x
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR323x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR323x
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR323x
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR323x
+;
+;     The new do.body.1 contains 2 of the original loop's iterations, so
+;     multiply it by 2, and add the new do.body to get the old do.body.
+;     ORIG323x: - do.body: float = 2.0,
+;     UR323x:   - do.body: float = 1.0,
+;     UR323x:   - do.body.1: float = 0.5,
+;
+;     UR323x:     call void @f
+;     UR323x:     br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR323x:     call void @f
+;     UR323x-NOT: br
+;     UR323x:     call void @f
+;     UR323x:     br label %do.end
+;     UR323x:     !0 = !{!"branch_weights", i32 1073741824, i32 1073741824}
+;
+; Original loop body frequency is 1 (loop weight 0).
+;
+;   First use a variable iteration count so that all non-final unrolled
+;   iterations' latches remain conditional.
+;
+;     RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG3110
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3110
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3110
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3110
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3110
+;
+;     The sum of the new do.body* is approximately the old do.body.
+;     ORIG3110: - do.body: float = 1.0,
+;     UR3110:   - do.body: float = 1.0,
+;     UR3110:   - do.body.1: float = 0.0{{(0000[0-9]*)?}},
+;     UR3110:   - do.body.2: float = 0.0{{(0000[0-9]*)?}},
+;
+;     UR3110: call void @f
+;     UR3110: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR3110: call void @f
+;     UR3110: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
+;     UR3110: call void @f
+;     UR3110: br label %do.end
+;     UR3110: !0 = !{!"branch_weights", i32 1, i32 0}
+;
+;   Now use a constant iteration count so that all non-final unrolled
+;   iterations' latches unconditionally continue.
+;
+;     RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG3130
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3130
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3130
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3130
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3130
+;
+;     The new do.body contains 3 of the original loop's iterations, so multiply
+;     it by 3, which is greater than the old do.body, which is impossibly low.
+;     ORIG3130: - do.body: float = 1.0,
+;     UR3130:   - do.body: float = 1.0,
+;
+;     UR3130:     call void @f
+;     UR3130-NOT: br
+;     UR3130:     call void @f
+;     UR3130-NOT: br
+;     UR3130:     call void @f
+;     UR3130:     ret void
+;
+;   Use a constant iteration count but now the loop upper bound computation can
+;   overflow.  When it does, the loop induction variable is greater than it
+;   immediately, so the initial unrolled iteration's latch remains conditional.
+;
+;     RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG313x
+;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR313x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR313x
+;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR313x
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR313x
+;
+;     The new do.body.1 contains 2 of the original loop's iterations, so
+;     multiply it by 2, and add the new do.body to get approximately the old
+;     do.body.
+;     ORIG313x: - do.body: float = 1.0,
+;     UR313x:   - do.body: float = 1.0,
+;     UR313x:   - do.body.1: float = 0.0{{(0000[0-9]*)?}},
+;
+;     UR313x:     call void @f
+;     UR313x:     br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR313x:     call void @f
+;     UR313x-NOT: br
+;     UR313x:     call void @f
+;     UR313x:     br label %do.end
+;     UR313x:     !0 = !{!"branch_weights", i32 -2147483648, i32 0}
+
+; ------------------------------------------------------------------------------
+; Check 4 max iterations:
+; - Unroll count of >=4 should always produce complete unrolling.
+; - That produces <=3 unrolled iteration latches.  3 is the lowest number where
+;   the implementation cannot compute uniform weights using a simple formula.
+;   Thus, this is our first case where -unroll-uniform-weights matters.
+;
+; Original loop body frequency is 5 (loop weight 4), which is impossibly high.
+;
+;   First use a variable iteration count so that all non-final unrolled
+;   iterations' latches remain conditional.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG4510
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4510
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4510
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4510
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4510
+;
+;     The sum of the new do.body* cannot reach the old do.body, which is
+;     impossibly high.
+;     ORIG4510: - do.body: float = 5.0,
+;     UR4510:   - do.body: float = 1.0,
+;     UR4510:   - do.body.1: float = 1.0,
+;     UR4510:   - do.body.2: float = 1.0,
+;     UR4510:   - do.body.3: float = 1.0,
+;
+;     The probabilities are maximized to try to reach the original frequency.
+;     UR4510: call void @f
+;     UR4510: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR4510: call void @f
+;     UR4510: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
+;     UR4510: call void @f
+;     UR4510: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0
+;     UR4510: call void @f
+;     UR4510: br label %do.end
+;     UR4510: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;
+;   Now use a constant iteration count so that all non-final unrolled
+;   iterations' latches unconditionally continue.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG4540
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4540
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4540
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4540
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4540
+;
+;     The new do.body contains 4 of the original loop's iterations, so multiply
+;     it by 4, which is less than the old do.body, which is impossibly high.
+;     ORIG4540: - do.body: float = 5.0,
+;     UR4540:   - do.body: float = 1.0,
+;
+;     UR4540:     call void @f
+;     UR4540-NOT: br
+;     UR4540:     call void @f
+;     UR4540-NOT: br
+;     UR4540:     call void @f
+;     UR4540-NOT: br
+;     UR4540:     call void @f
+;     UR4540:     ret void
+;
+;   Use a constant iteration count but now the loop upper bound computation can
+;   overflow.  When it does, the loop induction variable is greater than it
+;   immediately, so the initial unrolled iteration's latch remains conditional.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG454x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR454x
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR454x
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR454x
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR454x
+;
+;     The new do.body.1 contains 3 of the original loop's iterations, so
+;     multiply it by 3, and add the new do.body, but that sum is less than the
+;     old do.body, which is impossibly high.
+;     ORIG454x: - do.body: float = 5.0,
+;     UR454x:   - do.body: float = 1.0,
+;     UR454x:   - do.body.1: float = 1.0,
+;
+;     The sole probability is maximized to try to reach the original frequency.
+;     UR454x:     call void @f
+;     UR454x:     br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR454x:     call void @f
+;     UR454x-NOT: br
+;     UR454x:     call void @f
+;     UR454x-NOT: br
+;     UR454x:     call void @f
+;     UR454x:     br label %do.end
+;     UR454x:     !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;
+; Original loop body frequency is 4 (loop weight 3).
+;
+;   First use a variable iteration count so that all non-final unrolled
+;   iterations' latches remain conditional.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG4410
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4410
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4410
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4410
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4410
+;
+;     The sum of the new do.body* is the old do.body.
+;     ORIG4410: - do.body: float = 4.0,
+;     UR4410:   - do.body: float = 1.0,
+;     UR4410:   - do.body.1: float = 1.0,
+;     UR4410:   - do.body.2: float = 1.0,
+;     UR4410:   - do.body.3: float = 1.0,
+;
+;     UR4410: call void @f
+;     UR4410: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR4410: call void @f
+;     UR4410: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
+;     UR4410: call void @f
+;     UR4410: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0
+;     UR4410: call void @f
+;     UR4410: br label %do.end
+;     UR4410: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;
+;   Now use a constant iteration count so that all non-final unrolled
+;   iterations' latches unconditionally continue.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG4440
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4440
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4440
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4440
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4440
+;
+;     The new do.body contains 4 of the original loop's iterations, so multiply
+;     it by 4 to get the old do.body.
+;     ORIG4440: - do.body: float = 4.0,
+;     UR4440:   - do.body: float = 1.0,
+;
+;     UR4440:     call void @f
+;     UR4440-NOT: br
+;     UR4440:     call void @f
+;     UR4440-NOT: br
+;     UR4440:     call void @f
+;     UR4440-NOT: br
+;     UR4440:     call void @f
+;     UR4440:     ret void
+;
+;   Use a constant iteration count but now the loop upper bound computation can
+;   overflow.  When it does, the loop induction variable is greater than it
+;   immediately, so the initial unrolled iteration's latch remains conditional.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG444x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR444x
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR444x
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR444x
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR444x
+;
+;     The new do.body.1 contains 3 of the original loop's iterations, so
+;     multiply it by 3, and add the new do.body to get the old do.body.
+;     ORIG444x: - do.body: float = 4.0,
+;     UR444x:   - do.body: float = 1.0,
+;     UR444x:   - do.body.1: float = 1.0,
+;
+;     UR444x:     call void @f
+;     UR444x:     br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR444x:     call void @f
+;     UR444x-NOT: br
+;     UR444x:     call void @f
+;     UR444x-NOT: br
+;     UR444x:     call void @f
+;     UR444x:     br label %do.end
+;     UR444x:     !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;
+; Original loop body frequency is 3 (loop weight 2).  This is our first case
+; where the new probabilities vary (unless -unroll-uniform-weights).
+;
+;   First use a variable iteration count so that all non-final unrolled
+;   iterations' latches remain conditional.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG4310
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4310,UNIF4310
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4310,UNIF4310
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4310,FAST4310
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4310,FAST4310
+;
+;     The sum of the new do.body* is always approximately the old do.body.
+;     ORIG4310: - do.body: float = 3.0,
+;     UNIF4310: - do.body: float = 1.0,
+;     UNIF4310: - do.body.1: float = 0.81054,
+;     UNIF4310: - do.body.2: float = 0.65697,
+;     UNIF4310: - do.body.3: float = 0.5325,
+;     FAST4310: - do.body: float = 1.0,
+;     FAST4310: - do.body.1: float = 0.94737,
+;     FAST4310: - do.body.2: float = 0.63158,
+;     FAST4310: - do.body.3: float = 0.42105,
+;
+;     UR4310:        call void @f
+;     UR4310:        br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR4310:        call void @f
+;     UR4310:        br i1 %{{.*}}, label %do.end, label %do.body.2,
+;     UNIF4310-SAME:   !prof !0
+;     FAST4310-SAME:   !prof !1
+;     UR4310:        call void @f
+;     UR4310:        br i1 %{{.*}}, label %do.end, label %do.body.3,
+;     UNIF4310-SAME:   !prof !0
+;     FAST4310-SAME:   !prof !1
+;     UR4310:        call void @f
+;     UR4310:        br label %do.end
+;     UNIF4310:      !0 = !{!"branch_weights", i32 406871040, i32 1740612608}
+;     FAST4310:      !0 = !{!"branch_weights", i32 113025456, i32 2034458192}
+;     FAST4310:      !1 = !{!"branch_weights", i32 1, i32 2}
+;
+;   Now use a constant iteration count so that all non-final unrolled
+;   iterations' latches unconditionally continue.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG4340
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4340
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4340
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4340
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4340
+;
+;     The new do.body contains 4 of the original loop's iterations, so multiply
+;     it by 4, which is greater than the old do.body, which is impossibly low.
+;     ORIG4340: - do.body: float = 3.0,
+;     UR4340:   - do.body: float = 1.0,
+;
+;     UR4340:     call void @f
+;     UR4340-NOT: br
+;     UR4340:     call void @f
+;     UR4340-NOT: br
+;     UR4340:     call void @f
+;     UR4340-NOT: br
+;     UR4340:     call void @f
+;     UR4340:     ret void
+;
+;   Use a constant iteration count but now the loop upper bound computation can
+;   overflow.  When it does, the loop induction variable is greater than it
+;   immediately, so the initial unrolled iteration's latch remains conditional.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG434x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR434x
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR434x
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR434x
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR434x
+;
+;     The new do.body.1 contains 3 of the original loop's iterations, so
+;     multiply it by 3, and add the new do.body to get the old do.body.
+;     ORIG434x: - do.body: float = 3.0,
+;     UR434x:   - do.body: float = 1.0,
+;     UR434x:   - do.body.1: float = 0.66667,
+;
+;     UR434x:     call void @f
+;     UR434x:     br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR434x:     call void @f
+;     UR434x-NOT: br
+;     UR434x:     call void @f
+;     UR434x-NOT: br
+;     UR434x:     call void @f
+;     UR434x:     br label %do.end
+;     UR434x:     !0 = !{!"branch_weights", i32 715827884, i32 1431655764}
+;
+; Original loop body frequency is 2 (loop weight 1).
+;
+;   First use a variable iteration count so that all non-final unrolled
+;   iterations' latches remain conditional.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG4210
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4210,UNIF4210
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4210,UNIF4210
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4210,FAST4210
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4210,FAST4210
+;
+;     The sum of the new do.body* is always the old do.body.
+;     ORIG4210: - do.body: float = 2.0,
+;     UNIF4210: - do.body: float = 1.0,
+;     UNIF4210: - do.body.1: float = 0.54369,
+;     UNIF4210: - do.body.2: float = 0.2956,
+;     UNIF4210: - do.body.3: float = 0.16071,
+;     FAST4210: - do.body: float = 1.0,
+;     FAST4210: - do.body.1: float = 0.57143,
+;     FAST4210: - do.body.2: float = 0.28571,
+;     FAST4210: - do.body.3: float = 0.14286,
+;
+;     UR4210:        call void @f
+;     UR4210:        br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR4210:        call void @f
+;     UR4210:        br i1 %{{.*}}, label %do.end, label %do.body.2,
+;     UNIF4210-SAME:   !prof !0
+;     FAST4210-SAME:   !prof !1
+;     UR4210:        call void @f
+;     UR4210:        br i1 %{{.*}}, label %do.end, label %do.body.3,
+;     UNIF4210-SAME:   !prof !0
+;     FAST4210-SAME:   !prof !1
+;     UR4210:        call void @f
+;     UR4210:        br label %do.end
+;     UNIF4210:      !0 = !{!"branch_weights", i32 979920896, i32 1167562752}
+;     FAST4210:      !0 = !{!"branch_weights", i32 920350135, i32 1227133513}
+;     FAST4210:      !1 = !{!"branch_weights", i32 1, i32 1}
+;
+;   Now use a constant iteration count so that all non-final unrolled
+;   iterations' latches unconditionally continue.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG4240
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4240
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4240
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4240
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4240
+;
+;     The new do.body contains 4 of the original loop's iterations, so multiply
+;     it by 4, which is greater than the old do.body, which is impossibly low.
+;     ORIG4240: - do.body: float = 2.0,
+;     UR4240:   - do.body: float = 1.0,
+;
+;     UR4240:     call void @f
+;     UR4240-NOT: br
+;     UR4240:     call void @f
+;     UR4240-NOT: br
+;     UR4240:     call void @f
+;     UR4240-NOT: br
+;     UR4240:     call void @f
+;     UR4240:     ret void
+;
+;   Use a constant iteration count but now the loop upper bound computation can
+;   overflow.  When it does, the loop induction variable is greater than it
+;   immediately, so the initial unrolled iteration's latch remains conditional.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG424x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR424x
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR424x
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR424x
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR424x
+;
+;     The new do.body.1 contains 3 of the original loop's iterations, so
+;     multiply it by 3, and add the new do.body to get the old do.body.
+;     ORIG424x: - do.body: float = 2.0,
+;     UR424x:   - do.body: float = 1.0,
+;     UR424x:   - do.body.1: float = 0.33333,
+;
+;     UR424x:     call void @f
+;     UR424x:     br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR424x:     call void @f
+;     UR424x-NOT: br
+;     UR424x:     call void @f
+;     UR424x-NOT: br
+;     UR424x:     call void @f
+;     UR424x:     br label %do.end
+;     UR424x:     !0 = !{!"branch_weights", i32 1431655765, i32 715827883}
+;
+; Original loop body frequency is 1 (loop weight 0).
+;
+;   First use a variable iteration count so that all non-final unrolled
+;   iterations' latches remain conditional.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG4110
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4110
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4110
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4110
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4110
+;
+;     The sum of the new do.body* is approximately the old do.body.
+;     ORIG4110: - do.body: float = 1.0,
+;     UR4110:   - do.body: float = 1.0,
+;     UR4110:   - do.body.1: float = 0.0{{(0000[0-9]*)?}},
+;     UR4110:   - do.body.2: float = 0.0{{(0000[0-9]*)?}},
+;     UR4110:   - do.body.3: float = 0.0{{(0000[0-9]*)?}},
+;
+;     UR4110: call void @f
+;     UR4110: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR4110: call void @f
+;     UR4110: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
+;     UR4110: call void @f
+;     UR4110: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0
+;     UR4110: call void @f
+;     UR4110: br label %do.end
+;     UR4110: !0 = !{!"branch_weights", i32 1, i32 0}
+;
+;   Now use a constant iteration count so that all non-final unrolled
+;   iterations' latches unconditionally continue.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG4140
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4140
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4140
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4140
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4140
+;
+;     The new do.body contains 4 of the original loop's iterations, so multiply
+;     it by 4, which is greater than the old do.body, which is impossibly low.
+;     ORIG4140: - do.body: float = 1.0,
+;     UR4140:   - do.body: float = 1.0,
+;
+;     UR4140:     call void @f
+;     UR4140-NOT: br
+;     UR4140:     call void @f
+;     UR4140-NOT: br
+;     UR4140:     call void @f
+;     UR4140-NOT: br
+;     UR4140:     call void @f
+;     UR4140:     ret void
+;
+;   Use a constant iteration count but now the loop upper bound computation can
+;   overflow.  When it does, the loop induction variable is greater than it
+;   immediately, so the initial unrolled iteration's latch remains conditional.
+;
+;     RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
+;     RUN: %{bf-fc} ORIG414x
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR414x
+;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR414x
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR414x
+;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR414x
+;
+;     The new do.body.1 contains 3 of the original loop's iterations, so
+;     multiply it by 3, and add the new do.body to get approximately the old
+;     do.body.
+;     ORIG414x: - do.body: float = 1.0,
+;     UR414x:   - do.body: float = 1.0,
+;     UR414x:   - do.body.1: float = 0.0{{(0000[0-9]*)?}},
+;
+;     UR414x:     call void @f
+;     UR414x:     br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;     UR414x:     call void @f
+;     UR414x-NOT: br
+;     UR414x:     call void @f
+;     UR414x-NOT: br
+;     UR414x:     call void @f
+;     UR414x:     br label %do.end
+;     UR414x:     !0 = !{!"branch_weights", i32 -2147483648, i32 0}
+
+; ------------------------------------------------------------------------------
+; Check 5 max iterations:
+; - Unroll count of >=5 should always produce complete unrolling.
+; - That produces <=4 unrolled iteration latches.  When at least 3 remain
+;   conditional, the implementation cannot compute uniform weights using a
+;   simple formula, so -unroll-uniform-weights matters.
+;
+; Original loop body frequency is 5 (loop weight 4).
+;
+;   RUN: sed -e s/@MAX@/5/ -e s/@W@/4/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;   RUN: %{bf-fc} ORIG5510
+;   RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR5510,UNIF5510
+;   RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} UR5510,UNIF5510
+;   RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5510,FAST5510
+;   RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5510,FAST5510
+;
+;   The sum of the new do.body* is the old do.body.
+;   ORIG5510: - do.body: float = 5.0,
+;   UR5510:   - do.body: float = 1.0,
+;   UR5510:   - do.body.1: float = 1.0,
+;   UR5510:   - do.body.2: float = 1.0,
+;   UR5510:   - do.body.3: float = 1.0,
+;   UR5510:   - do.body.4: float = 1.0,
+;
+;   All continue probabilities are approximately 1, but somehow there is less
+;   precision in the calculation of the last case.
+;   UR5510:        call void @f
+;   UR5510:        br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;   UR5510:        call void @f
+;   UR5510:        br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
+;   UR5510:        call void @f
+;   UR5510:        br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0
+;   UR5510:        call void @f
+;   UR5510:        br i1 %{{.*}}, label %do.end, label %do.body.4,
+;   UNIF5510-SAME:   !prof !0
+;   FAST5510-SAME:   !prof !1
+;   UR5510:        call void @f
+;   UR5510:        br label %do.end
+;   UNIF5510: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;   FAST5510: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;   FAST5510: !1 = !{!"branch_weights", i32 10, i32 2147483638}
+;
+; Original loop body frequency is 4 (loop weight 3).
+;
+;   RUN: sed -e s/@MAX@/5/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;   RUN: %{bf-fc} ORIG5410
+;   RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR5410,UNIF5410
+;   RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} UR5410,UNIF5410
+;   RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5410,FAST5410
+;   RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5410,FAST5410
+;
+;   The sum of the new do.body* is always the old do.body.
+;   ORIG5410: - do.body: float = 4.0,
+;   UNIF5410: - do.body: float = 1.0,
+;   UNIF5410: - do.body.1: float = 0.88818,
+;   UNIF5410: - do.body.2: float = 0.78886,
+;   UNIF5410: - do.body.3: float = 0.70065,
+;   UNIF5410: - do.body.4: float = 0.62231,
+;   FAST5410: - do.body: float = 1.0,
+;   FAST5410: - do.body.1: float = 1.0,
+;   FAST5410: - do.body.2: float = 0.86486,
+;   FAST5410: - do.body.3: float = 0.64865,
+;   FAST5410: - do.body.4: float = 0.48649,
+;
+;   This is our first case where, when not using -unroll-uniform-weights, the
+;   implementation must adjust multiple probabilities to something other than
+;   the original latch probability but does not just set all probabilities to
+;   the limit of 1 or 0.
+;   UR5410:        call void @f
+;   UR5410:        br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;   UR5410:        call void @f
+;   UR5410:        br i1 %{{.*}}, label %do.end, label %do.body.2,
+;   UNIF5410-SAME:   !prof !0
+;   FAST5410-SAME:   !prof !1
+;   UR5410:        call void @f
+;   UR5410:        br i1 %{{.*}}, label %do.end, label %do.body.3,
+;   UNIF5410-SAME:   !prof !0
+;   FAST5410-SAME:   !prof !2
+;   UR5410:        call void @f
+;   UR5410:        br i1 %{{.*}}, label %do.end, label %do.body.4,
+;   UNIF5410-SAME:   !prof !0
+;   FAST5410-SAME:   !prof !2
+;   UR5410:        call void @f
+;   UR5410:        br label %do.end
+;   UNIF5410: !0 = !{!"branch_weights", i32 240132096, i32 1907351552}
+;   FAST5410: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;   FAST5410: !1 = !{!"branch_weights", i32 290200493, i32 1857283155}
+;   FAST5410: !2 = !{!"branch_weights", i32 1, i32 3}
+;
+; Original loop body frequency is 1 (loop weight 0).
+;
+;   RUN: sed -e s/@MAX@/5/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
+;   RUN: %{bf-fc} ORIG5110
+;   RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR5110
+;   RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} UR5110
+;   RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5110
+;   RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5110
+;
+;   The sum of the new do.body* is approximately the old do.body.
+;   ORIG5110: - do.body: float = 1.0,
+;   UR5110:   - do.body: float = 1.0,
+;   UR5110:   - do.body.1: float = 0.0{{(0000[0-9]*)?}},
+;   UR5110:   - do.body.2: float = 0.0{{(0000[0-9]*)?}},
+;   UR5110:   - do.body.3: float = 0.0{{(0000[0-9]*)?}},
+;   UR5110:   - do.body.4: float = 0.0{{(0000[0-9]*)?}},
+;
+;   UR5110: call void @f
+;   UR5110: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+;   UR5110: call void @f
+;   UR5110: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
+;   UR5110: call void @f
+;   UR5110: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0
+;   UR5110: call void @f
+;   UR5110: br i1 %{{.*}}, label %do.end, label %do.body.4, !prof !0
+;   UR5110: call void @f
+;   UR5110: br label %do.end
+;   UR5110: !0 = !{!"branch_weights", i32 1, i32 0}
+
+declare void @f(i32)
+
+define void @test(i32 %x, i32 %n) {
+entry:
+  %n.min = call i32 @llvm.umax.i32(i32 %n, i32 @MIN@)
+  %n.minmax = call i32 @llvm.umin.i32(i32 %n.min, i32 @MAX@)
+  %i_n = add i32 @I_0@, %n.minmax
+  br label %do.body
+
+do.body:
+  %i = phi i32 [ @I_0@, %entry ], [ %inc, %do.body ]
+  %inc = add i32 %i, 1
+  call void @f(i32 %i)
+  %c = icmp uge i32 %inc, %i_n
+  br i1 %c, label %do.end, label %do.body, !prof !0
+
+do.end:
+  ret void
+}
+
+; Loop body frequency is @W@ + 1.
+!0 = !{!"branch_weights", i32 1, i32 @W@}
diff --git a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-epilog.ll b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-epilog.ll
index f5d05e666cabb..c8ed8ef82a55f 100644
--- a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-epilog.ll
+++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-epilog.ll
@@ -2,7 +2,14 @@
 ; frequencies after loop unrolling with an epilogue.
 ;
 ; We check various interesting unroll count values relative to the original
-; loop's body frequency of 11 (e.g., minimum and boundary values).
+; loop's body frequency of 11, and we check when the epilogue loop itself is and
+; is not unrolled.
+;
+; Without -unroll-remainder, the epilogue is unrolled only at -unroll-count=2
+; because there it has only 1 iteration and so is always completely unrolled.
+; With -unroll-remainder, for some reason related to computing the remainder in
+; two's complement, the epilogue is completely unrolled only when -unroll-count
+; is a power of 2.
 ;
 ; For each case, we check:
 ; - Iteration frequencies
@@ -14,6 +21,23 @@
 ;     overlook any other branch weights (no extra !prof or branch_weights).
 ;   - We also check the number of original loop bodies (represented by a call to
 ;     @f) that appear within each unrolled iteration.
+; - Branch weight metadata
+;   - Checking frequencies already checks whether the branch weights have the
+;     expected effect, but we also want to check the following.
+;   - Whether the epilogue loop is unrolled should not affect the unrolled
+;     loop's estimated trip count or the branch weights on the unrolled loop
+;     guard, unrolled loop latch, or epilogue loop guard.
+;   - We get uniform probabilities/weights (same !prof) across the epilogue
+;     iteration latches if either:
+;     - Every iteration's latch remains conditional, so their original
+;       probabilities are not contradicted.
+;     - The number of remaining conditional latches is <= 2, so the
+;       implementation computes uniform branch weights by solving a linear or
+;       quadratic equation.
+;   - Otherwise, the earliest branch weights (starting with !prof !0) are
+;     adjusted as needed to produce the original loop body frequency, and the
+;     rest are left as they would be in the epilogue loop if it were not
+;     unrolled.
 ; - llvm.loop.estimated_trip_count
 ;   - For the unrolled and epilogue loops, must be the number of iterations
 ;     required for the original loop body to reach its original estimated trip
@@ -45,6 +69,7 @@
 ; Check -unroll-count=2.
 ;
 ; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2
+; RUN: %{ur-bf} -unroll-count=2 -unroll-remainder | %{fc} UR2
 ;
 ; Multiply do.body by 2 and add do.body.epil to get the original loop body
 ; frequency, 11.
@@ -72,22 +97,35 @@
 ; ------------------------------------------------------------------------------
 ; Check -unroll-count=4.
 ;
-; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4
+; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4,UR4-ELP
+; RUN: %{ur-bf} -unroll-count=4 -unroll-remainder | %{fc} UR4,UR4-EUR
 ;
-; Multiply do.body by 4 and add do.body.epil* to get the original loop body
-; frequency, 11.
-; UR4: - do.body: float = 2.3702,
-; UR4: - do.body.epil: float = 1.5193,
+; Multiply do.body by 4 and add do.body.epil* for either ELP or EUR to get the
+; original loop body frequency, 11.
+; UR4:     - do.body: float = 2.3702,
+; UR4-ELP: - do.body.epil: float = 1.5193,
+; UR4-EUR: - do.body.epil: float = 0.78453,
+; UR4-EUR: - do.body.epil.1: float = 0.46232,
+; UR4-EUR: - do.body.epil.2: float = 0.27244,
 ;
 ; Unrolled loop guard, body, and latch.
 ; UR4: br i1 %{{.*}}, label %do.body.epil.preheader, label %entry.new, !prof !0
 ; UR4-COUNT-4: call void @f
 ; UR4: br i1 %{{.*}}, label %do.end.unr-lcssa, label %do.body, !prof !1, !llvm.loop !2
 ;
-; Epilogue guard and loop.
+; Epilogue guard.
 ; UR4: br i1 %{{.*}}, label %do.body.epil.preheader, label %do.end, !prof !5
-; UR4: call void @f
-; UR4: br i1 %{{.*}}, label %do.body.epil, label %do.end.epilog-lcssa, !prof !6, !llvm.loop !7
+;
+; Non-unrolled epilogue loop.
+; UR4-ELP: call void @f
+; UR4-ELP: br i1 %{{.*}}, label %do.body.epil, label %do.end.epilog-lcssa, !prof !6, !llvm.loop !7
+;
+; Completely unrolled epilogue loop.
+; UR4-EUR: call void @f
+; UR4-EUR: br i1 %{{.*}}, label %do.body.epil.1, label %do.end.epilog-lcssa, !prof !6
+; UR4-EUR: call void @f
+; UR4-EUR: br i1 %{{.*}}, label %do.body.epil.2, label %do.end.epilog-lcssa, !prof !6
+; UR4-EUR: call void @f
 ;
 ; Unrolled loop metadata.
 ; UR4: !0 = !{!"branch_weights", i32 534047398, i32 1613436250}
@@ -97,30 +135,137 @@
 ; UR4: !4 = !{!"llvm.loop.unroll.disable"}
 ; UR4: !5 = !{!"branch_weights", i32 1531603292, i32 615880356}
 ;
-; Epilogue loop metadata.
-; UR4: !6 = !{!"branch_weights", i32 1038564635, i32 1108919013}
-; UR4: !7 = distinct !{!7, !8, !4}
-; UR4: !8 = !{!"llvm.loop.estimated_trip_count", i32 3}
+; Non-unrolled epilogue loop metadata.
+; UR4-ELP: !6 = !{!"branch_weights", i32 1038564635, i32 1108919013}
+; UR4-ELP: !7 = distinct !{!7, !8, !4}
+; UR4-ELP: !8 = !{!"llvm.loop.estimated_trip_count", i32 3}
+;
+; Completely unrolled epilogue loop metadata.  Because it loses its backedge:
+; - The remaining conditional latches' branch weights must be adjusted relative
+;   to the non-unrolled case.  There are only two, so the implementation can
+;   compute uniform branch weights using the quadratic formula.
+; - It has no llvm.loop.estimated_trip_count.
+; UR4-EUR: !6 = !{!"branch_weights", i32 1265493781, i32 881989867}
 
 ; ------------------------------------------------------------------------------
-; Check -unroll-count=10.
+; Check -unroll-count=8.
+;
+; RUN: %{ur-bf} -unroll-count=8 | %{fc} UR8,UR8-ELP
+; RUN: %{ur-bf} -unroll-count=8 -unroll-remainder | \
+; RUN:   %{fc} UR8,UR8-EUR
+;
+; Multiply do.body by 8 and add do.body.epil* for either ELP or EUR to get the
+; original loop body frequency, 11.
+; UR8:     - do.body: float = 0.96188,
+; UR8-ELP: - do.body.epil: float = 3.3049,
+; UR8-EUR: - do.body.epil: float = 0.91256,
+; UR8-EUR: - do.body.epil.1: float = 0.7716,
+; UR8-EUR: - do.body.epil.2: float = 0.55854,
+; UR8-EUR: - do.body.epil.3: float = 0.40432,
+; UR8-EUR: - do.body.epil.4: float = 0.29268,
+; UR8-EUR: - do.body.epil.5: float = 0.21186,
+; UR8-EUR: - do.body.epil.6: float = 0.15336,
 ;
-; RUN: %{ur-bf} -unroll-count=10 | %{fc} UR10
+; Unrolled loop guard, body, and latch.
+; UR8: br i1 %{{.*}}, label %do.body.epil.preheader, label %entry.new, !prof !0
+; UR8-COUNT-8: call void @f
+; UR8: br i1 %{{.*}}, label %do.end.unr-lcssa, label %do.body, !prof !1, !llvm.loop !2
+;
+; Epilogue guard.
+; UR8: br i1 %{{.*}}, label %do.body.epil.preheader, label %do.end, !prof !5
+;
+; Non-unrolled epilogue loop.
+; UR8-ELP: call void @f
+; UR8-ELP: br i1 %{{.*}}, label %do.body.epil, label %do.end.epilog-lcssa, !prof !6, !llvm.loop !7
+;
+; Completely unrolled epilogue loop.
+; UR8-EUR: call void @f
+; UR8-EUR: br i1 %{{.*}}, label %do.body.epil.1, label %do.end.epilog-lcssa, !prof !6
+; UR8-EUR: call void @f
+; UR8-EUR: br i1 %{{.*}}, label %do.body.epil.2, label %do.end.epilog-lcssa, !prof !7
+; UR8-EUR: call void @f
+; UR8-EUR: br i1 %{{.*}}, label %do.body.epil.3, label %do.end.epilog-lcssa, !prof !7
+; UR8-EUR: call void @f
+; UR8-EUR: br i1 %{{.*}}, label %do.body.epil.4, label %do.end.epilog-lcssa, !prof !7
+; UR8-EUR: call void @f
+; UR8-EUR: br i1 %{{.*}}, label %do.body.epil.5, label %do.end.epilog-lcssa, !prof !7
+; UR8-EUR: call void @f
+; UR8-EUR: br i1 %{{.*}}, label %do.body.epil.6, label %do.end.epilog-lcssa, !prof !7
+; UR8-EUR: call void @f
 ;
-; Multiply do.body by 8 and add do.body.epil* to get the original loop body
-; frequency, 11.
-; UR10: - do.body: float = 0.6902,
-; UR10: - do.body.epil: float = 4.098,
+; Unrolled loop metadata.
+; UR8: !0 = !{!"branch_weights", i32 1045484980, i32 1101998668}
+; UR8: !1 = !{!"branch_weights", i32 1145666677, i32 1001816971}
+; UR8: !2 = distinct !{!2, !3, !4}
+; UR8: !3 = !{!"llvm.loop.estimated_trip_count", i32 1}
+; UR8: !4 = !{!"llvm.loop.unroll.disable"}
+; UR8: !5 = !{!"branch_weights", i32 1781544591, i32 365939057}
+;
+; Non-unrolled epilogue loop metadata.
+; UR8-ELP: !6 = !{!"branch_weights", i32 1554520665, i32 592962983}
+; UR8-ELP: !7 = distinct !{!7, !8, !4}
+; UR8-ELP: !8 = !{!"llvm.loop.estimated_trip_count", i32 3}
+;
+; Completely unrolled epilogue loop metadata.  Because it loses its backedge:
+; - The remaining conditional latches' branch weights must be adjusted relative
+;   to the non-unrolled case.  There are many, so the implementation does not
+;   compute uniform branch weights.  Adjusting the first is sufficient, so the
+;   second is the same as the non-unrolled epilogue branch weights.
+; - It has no llvm.loop.estimated_trip_count.
+; UR8-EUR: !6 = !{!"branch_weights", i32 1815773828, i32 331709820}
+; UR8-EUR: !7 = !{!"branch_weights", i32 1554520665, i32 592962983}
+
+; ------------------------------------------------------------------------------
+; Check -unroll-count=10.
+;
+; RUN: %{ur-bf} -unroll-count=10 | %{fc} UR10,UR10-ELP
+; RUN: %{ur-bf} -unroll-count=10 -unroll-remainder | %{fc} UR10,UR10-EUR
+;
+; Multiply do.body by 10 and add do.body.epil* for either ELP or EUR to get the
+; original loop body frequency, 11.
+; UR10:     - do.body: float = 0.6902,
+; UR10-ELP: - do.body.epil: float = 4.098,
+; UR10-EUR: - do.body.epil: float = 1.0375,
+; UR10-EUR: - do.body.epil.1: float = 0.80019,
+; UR10-EUR: - do.body.epil.2: float = 0.61718,
+; UR10-EUR: - do.body.epil.3: float = 0.47602,
+; UR10-EUR: - do.body.epil.4: float = 0.36715,
+; UR10-EUR: - do.body.epil.5: float = 0.28318,
+; UR10-EUR: - do.body.epil.6: float = 0.21841,
+; UR10-EUR: - do.body.epil.7: float = 0.16846,
+; UR10-EUR: - do.body.epil.8: float = 0.12993,
 ;
 ; Unrolled loop guard, body, and latch.
 ; UR10: br i1 %{{.*}}, label %do.body.epil.preheader, label %entry.new, !prof !0
 ; UR10-COUNT-10: call void @f
 ; UR10: br i1 %{{.*}}, label %do.end.unr-lcssa, label %do.body, !prof !1, !llvm.loop !2
 ;
-; Epilogue guard and loop.
+; Epilogue guard.
 ; UR10: br i1 %{{.*}}, label %do.body.epil.preheader, label %do.end, !prof !5
-; UR10: call void @f
-; UR10: br i1 %{{.*}}, label %do.body.epil, label %do.end.epilog-lcssa, !prof !6, !llvm.loop !7
+;
+; Non-unrolled epilogue loop.
+; UR10-ELP: call void @f
+; UR10-ELP: br i1 %{{.*}}, label %do.body.epil, label %do.end.epilog-lcssa, !prof !6, !llvm.loop !7
+;
+; Partially unrolled epilogue loop.
+; UR10-EUR: call void @f
+; UR10-EUR: br i1 %{{.*}}, label %do.body.epil.1, label %do.end.epilog-lcssa, !prof !6
+; UR10-EUR: call void @f
+; UR10-EUR: br i1 %{{.*}}, label %do.body.epil.2, label %do.end.epilog-lcssa, !prof !6
+; UR10-EUR: call void @f
+; UR10-EUR: br i1 %{{.*}}, label %do.body.epil.3, label %do.end.epilog-lcssa, !prof !6
+; UR10-EUR: call void @f
+; UR10-EUR: br i1 %{{.*}}, label %do.body.epil.4, label %do.end.epilog-lcssa, !prof !6
+; UR10-EUR: call void @f
+; UR10-EUR: br i1 %{{.*}}, label %do.body.epil.5, label %do.end.epilog-lcssa, !prof !6
+; UR10-EUR: call void @f
+; UR10-EUR: br i1 %{{.*}}, label %do.body.epil.6, label %do.end.epilog-lcssa, !prof !6
+; UR10-EUR: call void @f
+; UR10-EUR: br i1 %{{.*}}, label %do.body.epil.7, label %do.end.epilog-lcssa, !prof !6
+; UR10-EUR: call void @f
+; UR10-EUR: br i1 %{{.*}}, label %do.body.epil.8, label %do.end.epilog-lcssa, !prof !6
+; UR10-EUR: call void @f
+; UR10-EUR: br i1 %{{.*}}, label %do.body.epil, label %do.end.epilog-lcssa, !prof !6, !llvm.loop !7
 ;
 ; Unrolled loop metadata.
 ; UR10: !0 = !{!"branch_weights", i32 1236740947, i32 910742701}
@@ -130,30 +275,69 @@
 ; UR10: !4 = !{!"llvm.loop.unroll.disable"}
 ; UR10: !5 = !{!"branch_weights", i32 1829762672, i32 317720976}
 ;
-; Epilogue loop metadata.  Its llvm.loop.estimated_trip_count happens to be the
-; same as the unrolled loop's, so there's no new metadata node.
-; UR10: !6 = !{!"branch_weights", i32 1656332913, i32 491150735}
-; UR10: !7 = distinct !{!7, ![[#LOOP_UR_TC:]], ![[#DISABLE:]]}
+; The unrolled epilogue loop does not lose any conditional branches, so:
+; - The non-unrolled epilogue branch weights are shared across them.
+; - This is our first case where the unrolled epilogue loop has an
+;   llvm.loop.estimated_trip_count.  However, it happens to be the same as the
+;   unrolled loop's, so there's no new metadata node.
+; UR10:     !6 = !{!"branch_weights", i32 1656332913, i32 491150735}
+; UR10-ELP: !7 = distinct !{!7, !3, !4}
+; UR10-EUR: !7 = distinct !{!7, !3}
 
 ; ------------------------------------------------------------------------------
 ; Check -unroll-count=11.
 ;
-; RUN: %{ur-bf} -unroll-count=11 | %{fc} UR11
-;
-; Multiply do.body by 11 and add do.body.epil* to get the original loop body
-; frequency, 11.
-; UR11: - do.body: float = 0.59359,
-; UR11: - do.body.epil: float = 4.4705,
+; RUN: %{ur-bf} -unroll-count=11 | %{fc} UR11,UR11-ELP
+; RUN: %{ur-bf} -unroll-count=11 -unroll-remainder | %{fc} UR11,UR11-EUR
+;
+; Multiply do.body by 11 and add do.body.epil* for either ELP or EUR to get the
+; original loop body frequency, 11.
+; UR11:     - do.body: float = 0.59359,
+; UR11-ELP: - do.body.epil: float = 4.4705,
+; UR11-EUR: - do.body.epil: float =   1.0428,
+; UR11-EUR: - do.body.epil.1: float = 0.82209,
+; UR11-EUR: - do.body.epil.2: float = 0.64812,
+; UR11-EUR: - do.body.epil.3: float = 0.51097,
+; UR11-EUR: - do.body.epil.4: float = 0.40284,
+; UR11-EUR: - do.body.epil.5: float = 0.31759,
+; UR11-EUR: - do.body.epil.6: float = 0.25038,
+; UR11-EUR: - do.body.epil.7: float = 0.1974,
+; UR11-EUR: - do.body.epil.8: float = 0.15562,
+; UR11-EUR: - do.body.epil.9: float = 0.12269,
 ;
 ; Unrolled loop guard, body, and latch.
 ; UR11: br i1 %{{.*}}, label %do.body.epil.preheader, label %entry.new, !prof !0
 ; UR11-COUNT-11: call void @f
 ; UR11: br i1 %{{.*}}, label %do.end.unr-lcssa, label %do.body, !prof !1, !llvm.loop !2
-
-; Epilogue guard and loop.
+;
+; Epilogue guard.
 ; UR11: br i1 %{{.*}}, label %do.body.epil.preheader, label %do.end, !prof !5
-; UR11: call void @f
-; UR11: br i1 %{{.*}}, label %do.body.epil, label %do.end.epilog-lcssa, !prof !6, !llvm.loop !7
+;
+; Non-unrolled epilogue loop.
+; UR11-ELP: call void @f
+; UR11-ELP: br i1 %{{.*}}, label %do.body.epil, label %do.end.epilog-lcssa, !prof !6, !llvm.loop !7
+;
+; Partially unrolled epilogue loop.
+; UR11-EUR: call void @f
+; UR11-EUR: br i1 %{{.*}}, label %do.body.epil.1, label %do.end.epilog-lcssa, !prof !6
+; UR11-EUR: call void @f
+; UR11-EUR: br i1 %{{.*}}, label %do.body.epil.2, label %do.end.epilog-lcssa, !prof !6
+; UR11-EUR: call void @f
+; UR11-EUR: br i1 %{{.*}}, label %do.body.epil.3, label %do.end.epilog-lcssa, !prof !6
+; UR11-EUR: call void @f
+; UR11-EUR: br i1 %{{.*}}, label %do.body.epil.4, label %do.end.epilog-lcssa, !prof !6
+; UR11-EUR: call void @f
+; UR11-EUR: br i1 %{{.*}}, label %do.body.epil.5, label %do.end.epilog-lcssa, !prof !6
+; UR11-EUR: call void @f
+; UR11-EUR: br i1 %{{.*}}, label %do.body.epil.6, label %do.end.epilog-lcssa, !prof !6
+; UR11-EUR: call void @f
+; UR11-EUR: br i1 %{{.*}}, label %do.body.epil.7, label %do.end.epilog-lcssa, !prof !6
+; UR11-EUR: call void @f
+; UR11-EUR: br i1 %{{.*}}, label %do.body.epil.8, label %do.end.epilog-lcssa, !prof !6
+; UR11-EUR: call void @f
+; UR11-EUR: br i1 %{{.*}}, label %do.body.epil.9, label %do.end.epilog-lcssa, !prof !6
+; UR11-EUR: call void @f
+; UR11-EUR: br i1 %{{.*}}, label %do.body.epil, label %do.end.epilog-lcssa, !prof !6, !llvm.loop !7
 ;
 ; Unrolled loop metadata.
 ; UR11: !0 = !{!"branch_weights", i32 1319535738, i32 827947910}
@@ -163,30 +347,74 @@
 ; UR11: !4 = !{!"llvm.loop.unroll.disable"}
 ; UR11: !5 = !{!"branch_weights", i32 1846907894, i32 300575754}
 ;
-; Epilogue loop metadata.
-; UR11: !6 = !{!"branch_weights", i32 1693034047, i32 454449601}
-; UR11: !7 = distinct !{!7, !8, !4}
-; UR11: !8 = !{!"llvm.loop.estimated_trip_count", i32 0}
+; The unrolled epilogue loop does not lose any conditional branches, so:
+; - The non-unrolled epilogue branch weights are shared across them.
+; - The unrolled epilogue loop has an llvm.loop.estimated_trip_count.  This is
+;   our first case where it is different than the unrolled loop's, so it has its
+;   own metadata node.  But it happens to be the same as the non-unrolled
+;   epilogue loop's.
+; UR11:     !6 = !{!"branch_weights", i32 1693034047, i32 454449601}
+; UR11-ELP: !7 = distinct !{!7, !8, !4}
+; UR11-EUR: !7 = distinct !{!7, !8}
+; UR11:     !8 = !{!"llvm.loop.estimated_trip_count", i32 0}
 
 ; ------------------------------------------------------------------------------
 ; Check -unroll-count=12.
 ;
-; RUN: %{ur-bf} -unroll-count=12 | %{fc} UR12
-;
-; Multiply do.body by 12 and add do.body.epil* to get the original loop body
-; frequency, 11.
-; UR12: - do.body: float = 0.5144,
-; UR12: - do.body.epil: float = 4.8272,
+; RUN: %{ur-bf} -unroll-count=12 | %{fc} UR12,UR12-ELP
+; RUN: %{ur-bf} -unroll-count=12 -unroll-remainder | %{fc} UR12,UR12-EUR
+;
+; Multiply do.body by 12 and add do.body.epil* for either ELP or EUR to get the
+; original loop body frequency, 11.
+; UR12:     - do.body: float = 0.5144,
+; UR12-ELP: - do.body.epil: float = 4.8272,
+; UR12-EUR: - do.body.epil: float = 1.0463,
+; UR12-EUR: - do.body.epil.1: float = 0.83968,
+; UR12-EUR: - do.body.epil.2: float = 0.67387,
+; UR12-EUR: - do.body.epil.3: float = 0.5408,
+; UR12-EUR: - do.body.epil.4: float = 0.43401,
+; UR12-EUR: - do.body.epil.5: float = 0.3483,
+; UR12-EUR: - do.body.epil.6: float = 0.27952,
+; UR12-EUR: - do.body.epil.7: float = 0.22433,
+; UR12-EUR: - do.body.epil.8: float = 0.18003,
+; UR12-EUR: - do.body.epil.9: float = 0.14448,
+; UR12-EUR: - do.body.epil.10: float = 0.11595,
 ;
 ; Unrolled loop guard, body, and latch.
 ; UR12: br i1 %{{.*}}, label %do.body.epil.preheader, label %entry.new, !prof !0
 ; UR12-COUNT-12: call void @f
 ; UR12: br i1 %{{.*}}, label %do.end.unr-lcssa, label %do.body, !prof !1, !llvm.loop !2
 ;
-; Epilogue guard and loop.
+; Epilogue guard.
 ; UR12: br i1 %{{.*}}, label %do.body.epil.preheader, label %do.end, !prof !5
-; UR12: call void @f
-; UR12: br i1 %{{.*}}, label %do.body.epil, label %do.end.epilog-lcssa, !prof !6, !llvm.loop !7
+;
+; Non-unrolled epilogue loop.
+; UR12-ELP: call void @f
+; UR12-ELP: br i1 %{{.*}}, label %do.body.epil, label %do.end.epilog-lcssa, !prof !6, !llvm.loop !7
+;
+; Partially unrolled epilogue loop.
+; UR12-EUR: call void @f
+; UR12-EUR: br i1 %{{.*}}, label %do.body.epil.1, label %do.end.epilog-lcssa, !prof !6
+; UR12-EUR: call void @f
+; UR12-EUR: br i1 %{{.*}}, label %do.body.epil.2, label %do.end.epilog-lcssa, !prof !6
+; UR12-EUR: call void @f
+; UR12-EUR: br i1 %{{.*}}, label %do.body.epil.3, label %do.end.epilog-lcssa, !prof !6
+; UR12-EUR: call void @f
+; UR12-EUR: br i1 %{{.*}}, label %do.body.epil.4, label %do.end.epilog-lcssa, !prof !6
+; UR12-EUR: call void @f
+; UR12-EUR: br i1 %{{.*}}, label %do.body.epil.5, label %do.end.epilog-lcssa, !prof !6
+; UR12-EUR: call void @f
+; UR12-EUR: br i1 %{{.*}}, label %do.body.epil.6, label %do.end.epilog-lcssa, !prof !6
+; UR12-EUR: call void @f
+; UR12-EUR: br i1 %{{.*}}, label %do.body.epil.7, label %do.end.epilog-lcssa, !prof !6
+; UR12-EUR: call void @f
+; UR12-EUR: br i1 %{{.*}}, label %do.body.epil.8, label %do.end.epilog-lcssa, !prof !6
+; UR12-EUR: call void @f
+; UR12-EUR: br i1 %{{.*}}, label %do.body.epil.9, label %do.end.epilog-lcssa, !prof !6
+; UR12-EUR: call void @f
+; UR12-EUR: br i1 %{{.*}}, label %do.body.epil.10, label %do.end.epilog-lcssa, !prof !6
+; UR12-EUR: call void @f
+; UR12-EUR: br i1 %{{.*}}, label %do.body.epil, label %do.end.epilog-lcssa, !prof !6, !llvm.loop !7
 ;
 ; Unrolled loop metadata.
 ; UR12: !0 = !{!"branch_weights", i32 1394803730, i32 752679918}
@@ -196,10 +424,16 @@
 ; UR12: !4 = !{!"llvm.loop.unroll.disable"}
 ; UR12: !5 = !{!"branch_weights", i32 1860963812, i32 286519836}
 ;
-; Epilogue loop metadata.
-; UR12: !6 = !{!"branch_weights", i32 1723419551, i32 424064097}
-; UR12: !7 = distinct !{!7, !8, !4}
-; UR12: !8 = !{!"llvm.loop.estimated_trip_count", i32 11}
+; The unrolled epilogue loop does not lose any conditional branches, so:
+; - The non-unrolled epilogue branch weights are shared across them.
+; - The unrolled epilogue loop has an llvm.loop.estimated_trip_count.  This is
+;   our first case where it is different than both the unrolled loop's and the
+;   non-unrolled epilogue loop's, so they all have distinct metadata nodes.
+; UR12:     !6 = !{!"branch_weights", i32 1723419551, i32 424064097}
+; UR12-ELP: !7 = distinct !{!7, !8, !4}
+; UR12-ELP: !8 = !{!"llvm.loop.estimated_trip_count", i32 11}
+; UR12-EUR: !7 = distinct !{!7, !8}
+; UR12-EUR: !8 = !{!"llvm.loop.estimated_trip_count", i32 1}
 
 declare void @f(i32)
 
diff --git a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll
new file mode 100644
index 0000000000000..f6dcdc49a4407
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll
@@ -0,0 +1,380 @@
+; Test branch weight metadata, estimated trip count metadata, and block
+; frequencies after partial loop unrolling without -unroll-runtime such that
+; some iterations' latches become unconditional, which often contradicts the
+; original branch weights.
+;
+; (unroll-complete.ll tests complete loop unrolling, in which the final unrolled
+; iteration unconditionally exits (backedge removed).  Here, we include cases
+; where the final iteration's latch unconditionally continues instead.)
+;
+; For each case, we check:
+; - Iteration frequencies
+;   - When each is multiplied by the number of original loop bodies that execute
+;     within it, they should sum to almost exactly the original loop body
+;     frequency.
+;   - The only exception is an impossibly high or low original frequency (e.g.,
+;     due to bad profile data), for which there exist no new branch weights that
+;     can yield that frequency sum.  In those cases, we expect the maximum or
+;     minimum possible frequency.
+; - CFGs
+;   - We verify which branch weights go with which branches and that we did not
+;     overlook any other branch weights (no extra !prof or branch_weights).
+;   - We also check the number of original loop bodies (represented by a call to
+;     @f) that appear within each unrolled iteration.
+; - Branch weight metadata
+;   - Checking frequencies already checks whether the branch weights have the
+;     expected effect, but we also want to check the following.
+;   - We get uniform probabilities/weights (same !prof) across the unrolled
+;     iteration latches if either:
+;     - Every iteration's latch remains conditional, so their original
+;       probabilities are not contradicted.
+;     - The original loop body frequency is 1, and then probabilities are all 0
+;       because only the first iteration is expected to execute.
+;     - The number of remaining conditional latches is <= 2, so the
+;       implementation computes uniform branch weights by solving a linear or
+;       quadratic equation.
+;     - -unroll-uniform-weights.
+;   - Otherwise, the earliest branch weights (starting with !prof !0) are
+;     adjusted as needed to produce the original loop body frequency, and the
+;     rest are left as they were in the original loop.
+; - llvm.loop.estimated_trip_count
+;   - Must be the number of iterations of the unrolled loop required for the
+;     original loop body to reach its original frequency.
+;   - Must not be blindly computed from any new latch branch weights.
+
+; ------------------------------------------------------------------------------
+; Define LIT substitutions.
+;
+; For verifying that the test code produces the original loop body frequency we
+; expect.
+; DEFINE: %{bf-fc} = opt %t.ll -S -passes='print<block-freq>' 2>&1 | \
+; DEFINE:   FileCheck %s -check-prefixes
+;
+; For checking the unrolled loop:
+; DEFINE: %{ur-bf} = opt %t.ll -S -passes='loop-unroll,print<block-freq>' 2>&1
+; DEFINE: %{fc} = FileCheck %s \
+; DEFINE:     -implicit-check-not='llvm.loop.estimated_trip_count' \
+; DEFINE:     -implicit-check-not='!prof' \
+; DEFINE:     -implicit-check-not='branch_weights' \
+; DEFINE:     -implicit-check-not='call void @f' -check-prefixes
+
+; ------------------------------------------------------------------------------
+; Check cases when the original loop's number of iterations is a run-time
+; determined multiple of 10 and the original loop body frequency is 10.
+;
+;   RUN: sed -e s/@N@/%mul10/ -e s/@W@/9/ %s > %t.ll
+;
+; At compile time, possibilities for that value always include unroll count x 10
+; x N for any integer N >= 1, so the unrolled loop's backedge always remains
+; conditional, so we check cases where it becomes unconditional later in this
+; test file with the CONST4 config.
+;
+; Check the original loop body frequency.
+;
+;   RUN: %{bf-fc} MULT-ORIG
+;   MULT-ORIG: - do.body: float = 10.0,
+;
+; When the unroll count is odd, every iteration's latch remains conditional, so
+; their original probabilities are not contradicted.  That is, the original loop
+; latch's branch weights remain on all unrolled iterations' latches.
+;
+;   RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} MULT3
+;   RUN: %{ur-bf} -unroll-count=3 | %{fc} MULT3
+;
+;   Sums to approximately the original loop body frequency, 10.
+;   MULT3: - do.body: float = 3.69,
+;   MULT3: - do.body.1: float = 3.321,
+;   MULT3: - do.body.2: float = 2.9889,
+;
+;   MULT3: call void @f
+;   MULT3: br i1 %{{.*}}, label %do.body.1, label %do.end, !prof !0
+;   MULT3: call void @f
+;   MULT3: br i1 %{{.*}}, label %do.body.2, label %do.end, !prof !0
+;   MULT3: call void @f
+;   MULT3: br i1 %{{.*}}, label %do.body, label %do.end, !prof !0, !llvm.loop !1
+;
+;   MULT3: !0 = !{!"branch_weights", i32 9, i32 1}
+;   MULT3: !1 = distinct !{!1, !2, !3}
+;   MULT3: !2 = !{!"llvm.loop.estimated_trip_count", i32 4}
+;   MULT3: !3 = !{!"llvm.loop.unroll.disable"}
+;
+; When the unroll count is even, odd-numbered unrolled iterations become
+; unconditional, so branch weights must be adjusted.
+;
+;   -unroll-count=2, so there is 1 remaining conditional latch, so the
+;   implementation can compute uniform weights by solving a linear equation.
+;   Thus, -unroll-uniform-weights has no effect.
+;
+;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} MULT2
+;     RUN: %{ur-bf} -unroll-count=2 | %{fc} MULT2
+;
+;     Multiply by 2 to get the original loop body frequency, 10.
+;     MULT2: - do.body: float = 5.0,
+;
+;     MULT2:     call void @f
+;     MULT2-NOT: br
+;     MULT2:     call void @f
+;     MULT2:     br i1 %{{.*}}, label %do.body, label %do.end, !prof !0, !llvm.loop !1{{$}}
+;
+;     The branch weights imply the estimated trip count is
+;     (1717986918+429496730)/429496730 = approximately (8+2)/2 = 5.
+;     MULT2: !0 = !{!"branch_weights", i32 1717986918, i32 429496730}
+;     MULT2: !1 = distinct !{!1, !2, !3}
+;     MULT2: !2 = !{!"llvm.loop.estimated_trip_count", i32 5}
+;     MULT2: !3 = !{!"llvm.loop.unroll.disable"}
+;
+;   -unroll-count=4, so there are 2 remaining conditional latches, so the
+;   implementation can compute uniform weights using the quadratic formula.
+;   Thus, -unroll-uniform-weights has no effect.
+;
+;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} MULT4
+;     RUN: %{ur-bf} -unroll-count=4 | %{fc} MULT4
+;
+;     Multiply by 2 and sum to get the original loop body frequency, 10.
+;     MULT4: - do.body: float = 2.7778,
+;     MULT4: - do.body.2: float = 2.2222,
+;
+;     MULT4:     call void @f
+;     MULT4-NOT: br
+;     MULT4:     call void @f
+;     MULT4:     br i1 %{{.*}}, label %do.body.2, label %do.end, !prof !0
+;     MULT4:     call void @f
+;     MULT4-NOT: br
+;     MULT4:     call void @f
+;     MULT4:     br i1 %{{.*}}, label %do.body, label %do.end, !prof !0, !llvm.loop !1
+;
+;     MULT4 is like applying -unroll-count=2 to MULT2 without converting any
+;     more conditional latches to unconditional, so MULT2's branch weights work.
+;     MULT4: !0 = !{!"branch_weights", i32 1717986918, i32 429496730}
+;     MULT4: !1 = distinct !{!1, !2, !3}
+;     MULT4: !2 = !{!"llvm.loop.estimated_trip_count", i32 3}
+;     MULT4: !3 = !{!"llvm.loop.unroll.disable"}
+;
+;   -unroll-count=6, so there are 3 remaining conditional latches, the lowest
+;   number where the implementation cannot compute uniform weights using a
+;   simple formula.  Thus, this is our first case where -unroll-uniform-weights
+;   matters.
+;
+;     RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} MULT6,MUNIF6
+;     RUN: %{ur-bf} -unroll-count=6 | %{fc} MULT6,MFAST6
+;
+;     For either MUNIF or MFAST, multiply by 2 and sum to get the original loop
+;     body frequency, 10.
+;     MUNIF6: - do.body: float = 2.0492,
+;     MUNIF6: - do.body.2: float = 1.6393,
+;     MUNIF6: - do.body.4: float = 1.3115,
+;     MFAST6: - do.body: float = 2.1956,
+;     MFAST6: - do.body.2: float = 1.476,
+;     MFAST6: - do.body.4: float = 1.3284,
+;
+;     MULT6:       call void @f
+;     MULT6-NOT:   br
+;     MULT6:       call void @f
+;     MULT6:       br i1 %{{.*}}, label %do.body.2, label %do.end, !prof !0
+;     MULT6:       call void @f
+;     MULT6-NOT:   br
+;     MULT6:       call void @f
+;     MULT6:       br i1 %{{.*}}, label %do.body.4, label %do.end,
+;     MUNIF6-SAME:   !prof !0
+;     MFAST6-SAME:   !prof !1
+;     MULT6:       call void @f
+;     MULT6-NOT:   br
+;     MULT6:       call void @f
+;     MULT6:       br i1 %{{.*}}, label %do.body, label %do.end,
+;     MUNIF6-SAME:   !prof !0, !llvm.loop !1
+;     MFAST6-SAME:   !prof !1, !llvm.loop !2
+;
+;     MUNIF6 is like applying -unroll-count=3 to MULT2 without converting any
+;     additional conditional latches to unconditional, so (approximately)
+;     MULT2's branch weights make sense.
+;     MUNIF6: !0 = !{!"branch_weights", i32 1717986944, i32 429496704}
+;     MUNIF6: !1 = distinct !{!1, !2, !3}
+;     MUNIF6: !2 = !{!"llvm.loop.estimated_trip_count", i32 2}
+;     MUNIF6: !3 = !{!"llvm.loop.unroll.disable"}
+;
+;     There are 3 conditional latches remaining, so MFAST6 adjusts the first and
+;     leaves the second two with the original loop's branch weights.
+;     MFAST6: !0 = !{!"branch_weights", i32 1443686486, i32 703797162}
+;     MFAST6: !1 = !{!"branch_weights", i32 9, i32 1}
+;     MFAST6: !2 = distinct !{!2, !3, !4}
+;     MFAST6: !3 = !{!"llvm.loop.estimated_trip_count", i32 2}
+;     MFAST6: !4 = !{!"llvm.loop.unroll.disable"}
+
+; ------------------------------------------------------------------------------
+; Check case when the original loop's number of iterations is a run-time
+; determined multiple of 10, the unroll count is even so that odd-numbered
+; unrolled iterations become unconditional, and the original loop body frequency
+; is 1, which is impossibly low.  This case is important to ensure the
+; implementation does not malfunction by trying to use negative and possibly
+; infinite probabilities to reach the original loop body frequency.
+;
+;   RUN: sed -e s/@N@/%mul10/ -e s/@W@/0/ %s > %t.ll
+;
+; Check the original loop body frequency.
+;
+;   RUN: %{bf-fc} LOW-ORIG
+;   LOW-ORIG: - do.body: float = 1.0,
+;
+; -unroll-count=2, so there is 1 remaining conditional latch.  The
+; implementation tries to compute uniform weights by solving a linear equation
+; but ultimately sets the latch's probability to zero.
+;
+;   RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} LOW2
+;   RUN: %{ur-bf} -unroll-count=2 | %{fc} LOW2
+;
+;   Multiply by 2, but the result is greater than the original loop body
+;   frequency, 1, which is impossibly low.
+;   LOW2: - do.body: float = 1.0,
+;
+;   LOW2:     call void @f
+;   LOW2-NOT: br
+;   LOW2:     call void @f
+;   LOW2:     br i1 %{{.*}}, label %do.body, label %do.end, !prof !0, !llvm.loop !1{{$}}
+;
+;   LOW2: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;   LOW2: !1 = distinct !{!1, !2, !3}
+;   LOW2: !2 = !{!"llvm.loop.estimated_trip_count", i32 1}
+;   LOW2: !3 = !{!"llvm.loop.unroll.disable"}
+;
+; -unroll-count=4, so there are 2 remaining conditional latches.  The
+; implementation tries to compute uniform weights using the quadratic formula
+; but ultimately sets both latches' probabilities to zero.
+;
+;   RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} LOW4
+;   RUN: %{ur-bf} -unroll-count=4 | %{fc} LOW4
+;
+;   Multiply by 2 and sum, but the result is greater than the original loop body
+;   frequency, 1, which is impossibly low.
+;   LOW4: - do.body: float = 1.0,
+;   LOW4: - do.body.2: float = 0.0{{(0000[0-9]*)?}},
+;
+;   LOW4:     call void @f
+;   LOW4-NOT: br
+;   LOW4:     call void @f
+;   LOW4:     br i1 %{{.*}}, label %do.body.2, label %do.end, !prof !0
+;   LOW4:     call void @f
+;   LOW4-NOT: br
+;   LOW4:     call void @f
+;   LOW4:     br i1 %{{.*}}, label %do.body, label %do.end, !prof !0, !llvm.loop !1
+;
+;   LOW4: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;   LOW4: !1 = distinct !{!1, !2, !3}
+;   LOW4: !2 = !{!"llvm.loop.estimated_trip_count", i32 1}
+;   LOW4: !3 = !{!"llvm.loop.unroll.disable"}
+;
+; -unroll-count=6, so there are 3 remaining conditional latches.  The
+; implementation cannot compute uniform weights using a simple formula, and
+; ultimately it must set all those latches' probabilities to zero.  If not
+; -unroll-uniform-weights, then the implementation will face a new stumbling
+; block starting at the second latch: reaching the remaining iterations already
+; has a zero probability due to the zero probability set at the first latch, so
+; the required probability could accidentally be computed as negative infinity.
+;
+;   RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} LOW6
+;   RUN: %{ur-bf} -unroll-count=6 | %{fc} LOW6
+;
+;   Multiply by 2 and sum, but the result is greater than the original loop body
+;   frequency, 1, which is impossibly low.
+;   LOW6: - do.body: float = 1.0,
+;   LOW6: - do.body.2: float = 0.0{{(0000[0-9]*)?}},
+;   LOW6: - do.body.4: float = 0.0{{(0000[0-9]*)?}},
+;
+;   LOW6:     call void @f
+;   LOW6-NOT: br
+;   LOW6:     call void @f
+;   LOW6:     br i1 %{{.*}}, label %do.body.2, label %do.end, !prof !0
+;   LOW6:     call void @f
+;   LOW6-NOT: br
+;   LOW6:     call void @f
+;   LOW6:     br i1 %{{.*}}, label %do.body.4, label %do.end, !prof !0
+;   LOW6:     call void @f
+;   LOW6-NOT: br
+;   LOW6:     call void @f
+;   LOW6:     br i1 %{{.*}}, label %do.body, label %do.end, !prof !0, !llvm.loop !1
+;
+;   LOW6: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+;   LOW6: !1 = distinct !{!1, !2, !3}
+;   LOW6: !2 = !{!"llvm.loop.estimated_trip_count", i32 1}
+;   LOW6: !3 = !{!"llvm.loop.unroll.disable"}
+
+; ------------------------------------------------------------------------------
+; Check cases when the original loop's number of iterations is a constant 10 and
+; the original loop body frequency is 10.
+;
+;   RUN: sed -e s/@N@/10/g -e s/@W@/9/ %s > %t.ll
+;
+; Because we test only partial unrolling, there is always exactly one unrolled
+; iteration that can possibly exit, so only its latch can remain conditional.
+; Because there is only one, its branch weights can be computed with a simple
+; formula, and -unroll-uniform-weights does not matter.
+;
+; Check the original loop body frequency.
+;
+;   RUN: %{bf-fc} CONST-ORIG
+;   CONST-ORIG: - do.body: float = 10.0,
+;
+; Check when the unrolled loop's backedge remains conditional.
+;
+;   RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} CONST2
+;   RUN: %{ur-bf} -unroll-count=2 | %{fc} CONST2
+;
+;   Multiply by 2 to get the original loop body frequency, 10.
+;   CONST2: - do.body: float = 5.0,
+;
+;   CONST2:     call void @f
+;   CONST2-NOT: br:
+;   CONST2:     call void @f
+;   CONST2:     br i1 %{{.*}}, label %do.body, label %do.end, !prof !0, !llvm.loop !1
+;
+;   Like MULT2.
+;   CONST2: !0 = !{!"branch_weights", i32 1717986918, i32 429496730}
+;   CONST2: !1 = distinct !{!1, !2, !3}
+;   CONST2: !2 = !{!"llvm.loop.estimated_trip_count", i32 5}
+;   CONST2: !3 = !{!"llvm.loop.unroll.disable"}
+;
+; Check when the unrolled loop's backedge unconditionally continues.
+;
+;   RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} CONST4
+;   RUN: %{ur-bf} -unroll-count=4 | %{fc} CONST4
+;
+;   Multiply by 2 and sum to get the original loop body frequency, 10.
+;   CONST4: - do.body: float = 3.0,
+;   CONST4: - do.body.2: float = 2.0,
+;
+;   CONST4:     call void @f
+;   CONST4-NOT: br
+;   CONST4:     call void @f
+;   CONST4:     br i1 %{{.*}}, label %do.body.2, label %do.end, !prof !0
+;   CONST4:     call void @f
+;   CONST4-NOT: br
+;   CONST4:     call void @f
+;   CONST4:     br label %do.body, !llvm.loop !1
+;
+;   There is no llvm.loop.estimated_trip_count because the unrolled loop's latch
+;   in do.body.2 unconditionally continues.  The branch weights on do.body's
+;   branch imply do.body continues twice and then exits once, thus executing the
+;   original loop body 10 times.
+;   CONST4: !0 = !{!"branch_weights", i32 1431655765, i32 715827883}
+;   CONST4: !1 = distinct !{!1, !2}
+;   CONST4: !2 = !{!"llvm.loop.unroll.disable"}
+
+declare void @f(i32)
+
+define void @test(i32 %n) {
+entry:
+  %mul10 = mul i32 %n, 10
+  br label %do.body
+
+do.body:
+  %i = phi i32 [ 0, %entry ], [ %next, %do.body ]
+  call void @f(i32 %i)
+  %next = add i32 %i, 1
+  %c = icmp ne i32 %next, @N@
+  br i1 %c, label %do.body, label %do.end, !prof !0
+
+do.end:
+  ret void
+}
+
+; Loop body frequency is @W@ + 1.
+!0 = !{!"branch_weights", i32 @W@, i32 1}
diff --git a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial.ll b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial.ll
index af5342c5e35cd..ea6f4a4180fc9 100644
--- a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial.ll
+++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial.ll
@@ -1,5 +1,6 @@
 ; Test branch weight metadata, estimated trip count metadata, and block
-; frequencies after partial loop unrolling without -unroll-runtime.
+; frequencies after partial loop unrolling without -unroll-runtime and without
+; converting any iteration's latch to an unconditional branch.
 
 ; ------------------------------------------------------------------------------
 ; RUN: opt < %s -S -passes='print<block-freq>' 2>&1 | \
diff --git a/llvm/test/Transforms/LoopUnroll/loop-probability-one.ll b/llvm/test/Transforms/LoopUnroll/loop-probability-one.ll
index 14f6da42df6b1..89915d29f5921 100644
--- a/llvm/test/Transforms/LoopUnroll/loop-probability-one.ll
+++ b/llvm/test/Transforms/LoopUnroll/loop-probability-one.ll
@@ -1,73 +1,97 @@
 ; Check that a loop probability of one (indicating an always infinite loop) does
 ; not crash or otherwise break LoopUnroll behavior when it tries to compute new
 ; probabilities from it.
-;
-; That case indicates an always infinite loop.  A remainder loop cannot be
-; calculated at run time when the original loop is infinite as infinity %
-; UnrollCount is undefined, so consistent remainder loop probabilities are
-; difficult or impossible to reason about.  The implementation chooses
-; probabilities indicating that all remainder loop iterations will always
-; execute.
-
-; DEFINE: %{unroll} = opt < %s -unroll-count=3 -passes=loop-unroll -S
-; DEFINE: %{rt} = %{unroll} -unroll-runtime
-
-; RUN: %{unroll} | FileCheck %s -check-prefix UNROLL
-; RUN: %{rt} -unroll-runtime-epilog=true | FileCheck %s -check-prefix EPILOG
-; RUN: %{rt} -unroll-runtime-epilog=false | FileCheck %s -check-prefix PROLOG
-
-define void @test(i32 %n) {
-entry:
-  br label %loop
 
-loop:
-  %i = phi i32 [ 0, %entry ], [ %inc, %loop ]
-  %inc = add i32 %i, 1
-  %c = icmp slt i32 %inc, %n
-  br i1 %c, label %loop, label %end, !prof !0
+; DEFINE: %{unroll} = opt < %t.ll -unroll-count=3 -passes=loop-unroll -S
+; DEFINE: %{fc} = FileCheck %s \
+; DEFINE:     -implicit-check-not='llvm.loop.estimated_trip_count' \
+; DEFINE:     -implicit-check-not='!prof' \
+; DEFINE:     -implicit-check-not='branch_weights' \
+; DEFINE:     -implicit-check-not='call void @f' -check-prefixes
 
-end:
-  ret void
-}
+; ------------------------------------------------------------------------------
+; A partially unrolled loop remains infinite.
+;
+; RUN: sed -e s/@N@/%n/ %s > %t.ll
+; RUN: %{unroll} | %{fc} PART-ALL-COND
+;
+; PART-ALL-COND: call void @f
+; PART-ALL-COND: br i1 %{{.*}}, label %loop.1, label %end, !prof !0
+; PART-ALL-COND: call void @f
+; PART-ALL-COND: br i1 %{{.*}}, label %loop.2, label %end, !prof !0
+; PART-ALL-COND: call void @f
+; PART-ALL-COND: br i1 %{{.*}}, label %loop, label %end, !prof !0, !llvm.loop !1
+; PART-ALL-COND: !0 = !{!"branch_weights", i32 1, i32 0}
 
+; ------------------------------------------------------------------------------
+; A partially unrolled loop remains infinite even if some iterations' latches
+; become unconditional.
+;
+; RUN: sed -e s/@N@/5/ %s > %t.ll
+; RUN: %{unroll} | %{fc} PART-SOME-COND
+;
+; PART-SOME-COND:     call void @f
+; PART-SOME-COND-NOT: br
+; PART-SOME-COND:     call void @f
+; PART-SOME-COND:     br i1 %{{.*}}, label %loop.2, label %end, !prof !0
+; PART-SOME-COND:     call void @f
+; PART-SOME-COND:     br label %loop, !llvm.loop !1
+; PART-SOME-COND:     !0 = !{!"branch_weights", i32 1, i32 0}
 
-!0 = !{!"branch_weights", i32 1, i32 0}
+; ------------------------------------------------------------------------------
+; A completely unrolled loop cannot be infinite, so consistent unrolled loop
+; probabilities are impossible.  The implementation chooses probabilities
+; indicating that all unrolled loop iterations will always execute.
+;
+; RUN: sed -e s/@N@/%max3/ %s > %t.ll
+; RUN: %{unroll} | %{fc} COMPLETE-SOME-COND
+;
+; COMPLETE-SOME-COND: call void @f
+; COMPLETE-SOME-COND: br i1 %{{.*}}, label %loop.1, label %end, !prof !0
+; COMPLETE-SOME-COND: call void @f
+; COMPLETE-SOME-COND: br i1 %{{.*}}, label %loop.2, label %end, !prof !0
+; COMPLETE-SOME-COND: call void @f
+; COMPLETE-SOME-COND: br label %end
+; COMPLETE-SOME-COND: !0 = !{!"branch_weights", i32 1, i32 0}
 
-; UNROLL: define void @test(i32 %n) {
-; UNROLL: entry:
-; UNROLL:   br label %loop
-; UNROLL: loop:
-; UNROLL:   br i1 %c, label %loop.1, label %end, !prof !0
-; UNROLL: loop.1:
-; UNROLL:   br i1 %c.1, label %loop.2, label %end, !prof !0
-; UNROLL: loop.2:
-; UNROLL:   br i1 %c.2, label %loop, label %end, !prof !0, !llvm.loop !1
-; UNROLL-NOT: loop.3
-; UNROLL: end:
-; UNROLL:   ret void
-; UNROLL: }
-;
-; Infinite unrolled loop.
-; UNROLL: !0 = !{!"branch_weights", i32 1, i32 0}
+; ------------------------------------------------------------------------------
+; A completely unrolled loop with no remaining conditional latches gives the
+; implementation no probabilities to set.  Check that it still behaves.
+;
+; RUN: sed -e s/@N@/3/ %s > %t.ll
+; RUN: %{unroll} | %{fc} COMPLETE-NO-COND
+;
+; COMPLETE-NO-COND:     call void @f
+; COMPLETE-NO-COND-NOT: br
+; COMPLETE-NO-COND:     call void @f
+; COMPLETE-NO-COND-NOT: br
+; COMPLETE-NO-COND:     call void @f
 
-; EPILOG: define void @test(i32 %n) {
-; EPILOG: entry:
-; EPILOG:   br i1 %{{.*}}, label %loop.epil.preheader, label %entry.new, !prof !0
-; EPILOG: entry.new:
-; EPILOG:   br label %loop
-; EPILOG: loop:
-; EPILOG:   br i1 %{{.*}}, label %loop, label %end.unr-lcssa, !prof !1
-; EPILOG: end.unr-lcssa:
-; EPILOG:   br i1 %{{.*}}, label %loop.epil.preheader, label %end, !prof !1
-; EPILOG: loop.epil.preheader:
-; EPILOG:   br label %loop.epil
-; EPILOG: loop.epil:
-; EPILOG:   br i1 %{{.*}}, label %loop.epil, label %end.epilog-lcssa, !prof !4
-; EPILOG: end.epilog-lcssa:
-; EPILOG:   br label %end
-; EPILOG: end:
-; EPILOG:   ret void
-; EPILOG: }
+; ------------------------------------------------------------------------------
+; A remainder loop cannot be calculated at run time when the original loop is
+; infinite as infinity % UnrollCount is undefined, so consistent remainder loop
+; probabilities are difficult or impossible to reason about.  The implementation
+; chooses probabilities indicating that all remainder loop iterations will
+; always execute.
+;
+; RUN: sed -e s/@N@/%n/ %s > %t.ll
+; DEFINE: %{rt} = %{unroll} -unroll-runtime
+; RUN: %{rt} -unroll-runtime-epilog=true | %{fc} EPILOG
+; RUN: %{rt} -unroll-runtime-epilog=false | %{fc} PROLOG
+;
+; Unrolled loop guard, body, and latch.
+; EPILOG:     br i1 %{{.*}}, label %loop.epil.preheader, label %entry.new, !prof !0
+; EPILOG:     call void @f
+; EPILOG-NOT: br
+; EPILOG:     call void @f
+; EPILOG-NOT: br
+; EPILOG:     call void @f
+; EPILOG:     br i1 %{{.*}}, label %loop, label %end.unr-lcssa, !prof !1
+;
+; Epilogue guard, body, and latch.
+; EPILOG: br i1 %{{.*}}, label %loop.epil.preheader, label %end, !prof !1
+; EPILOG: call void @f
+; EPILOG: br i1 %{{.*}}, label %loop.epil, label %end.epilog-lcssa, !prof !4
 ;
 ; Unrolled loop guard: Unrolled loop is always entered.
 ; EPILOG: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
@@ -78,27 +102,20 @@ end:
 ;
 ; Epilogue loop latch: Epilogue loop executes both of its 2 iterations.
 ; EPILOG: !4 = !{!"branch_weights", i32 1073741824, i32 1073741824}
-
-; PROLOG: define void @test(i32 %n) {
-; PROLOG: entry:
-; PROLOG:   br i1 %{{.*}}, label %loop.prol.preheader, label %loop.prol.loopexit, !prof !0
-; PROLOG: loop.prol.preheader:
-; PROLOG:   br label %loop.prol
-; PROLOG: loop.prol:
-; PROLOG:   br i1 %{{.*}}, label %loop.prol, label %loop.prol.loopexit.unr-lcssa, !prof !1
-; PROLOG: loop.prol.loopexit.unr-lcssa:
-; PROLOG:   br label %loop.prol.loopexit
-; PROLOG: loop.prol.loopexit:
-; PROLOG:   br i1 %{{.*}}, label %end, label %entry.new, !prof !0
-; PROLOG: entry.new:
-; PROLOG:   br label %loop
-; PROLOG: loop:
-; PROLOG:   br i1 %{{.*}}, label %loop, label %end.unr-lcssa, !prof !4
-; PROLOG: end.unr-lcssa:
-; PROLOG:   br label %end
-; PROLOG: end:
-; PROLOG:   ret void
-; PROLOG: }
+;
+; Prologue guard, body, and latch.
+; PROLOG: br i1 %{{.*}}, label %loop.prol.preheader, label %loop.prol.loopexit, !prof !0
+; PROLOG: call void @f
+; PROLOG: br i1 %{{.*}}, label %loop.prol, label %loop.prol.loopexit.unr-lcssa, !prof !1
+;
+; Unrolled loop guard, body, and latch.
+; PROLOG:     br i1 %{{.*}}, label %end, label %entry.new, !prof !0
+; PROLOG:     call void @f
+; PROLOG-NOT: br
+; PROLOG:     call void @f
+; PROLOG-NOT: br
+; PROLOG:     call void @f
+; PROLOG:     br i1 %{{.*}}, label %loop, label %end.unr-lcssa, !prof !4
 ;
 ; FIXME: Branch weights still need to be fixed in the case of prologues (issue
 ; #135812), so !0 and !1 do not yet match their comments below.  When we do
@@ -114,3 +131,23 @@ end:
 ;
 ; Unrolled loop latch: Unrolled loop is infinite.
 ; PROLOG: !4 = !{!"branch_weights", i32 1, i32 0}
+
+declare void @f(i32)
+
+define void @test(i32 %n) {
+entry:
+  %max3 = call i32 @llvm.umin.i32(i32 %n, i32 3)
+  br label %loop
+
+loop:
+  %i = phi i32 [ 0, %entry ], [ %inc, %loop ]
+  call void @f(i32 %i)
+  %inc = add i32 %i, 1
+  %c = icmp slt i32 %inc, @N@
+  br i1 %c, label %loop, label %end, !prof !0
+
+end:
+  ret void
+}
+
+!0 = !{!"branch_weights", i32 1, i32 0}

>From 4a07c970cc1b17eedef96b12949c919135ecfc1c Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Thu, 19 Feb 2026 18:28:56 -0500
Subject: [PATCH 2/2] Extract many changes into descendant PRs

New commit log:

[LoopUnroll] Fix freqs for unconditional latches: N<=2

As another step in issue #135812, this patch fixes block frequencies
when LoopUnroll converts a conditional latch in an unrolled loop
iteration to unconditional.  It thus includes complete loop unrolling
(the conditional backedge becomes an unconditional loop exit), which
might be applied to the original loop or to its remainder loop.

As explained in detail in the header comments on the
fixProbContradiction function that this patch introduces, these
conversions mean LoopUnroll has proven that the original uniform latch
probability is incorrect for the original loop iterations associated
with the converted latches.  However, LoopUnroll often is able to
perform these corrections for only some iterations, leaving other
iterations with the original latch probability, and thus corrupting
the aggregate effect on the total frequency of the original loop body.

This patch ensures that the total frequency of the original loop body,
summed across all its occurrences in the unrolled loop after the
aforementioned conversions, is the same as in the original loop.
Unlike other patches in this series, this patch cannot derive the
required latch probabilities directly from the original uniform latch
probability because it has been proven incorrect for some original
loop iterations.  Instead, this patch computes entirely new
probabilities for the remaining N conditional latches in the unrolled
loop.

This patch only handles N <= 2, for which it uses simple formulas to
compute a single uniform probability across the latches.  Future
patches will handle N > 2.

This patch series does not consider the presence of non-latch loop
exits, and I do not have a solid plan for that case.  See fixme
comments this patch introduces.
---
 llvm/lib/Transforms/Utils/LoopUnroll.cpp      | 255 +-------
 .../branch-weights-freq/unroll-complete.ll    | 618 +-----------------
 .../branch-weights-freq/unroll-epilog.ll      |  85 +--
 .../unroll-partial-unconditional-latch.ll     | 117 +---
 4 files changed, 50 insertions(+), 1025 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index f216f2503e280..529cbd3f5b5da 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -89,11 +89,6 @@ UnrollRuntimeEpilog("unroll-runtime-epilog", cl::init(false), cl::Hidden,
                     cl::desc("Allow runtime unrolled loops to be unrolled "
                              "with epilog instead of prolog."));
 
-static cl::opt<bool> UnrollUniformWeights(
-    "unroll-uniform-weights", cl::init(false), cl::Hidden,
-    cl::desc("If new branch weights must be found, work harder to keep them "
-             "uniform."));
-
 static cl::opt<bool>
 UnrollVerifyDomtree("unroll-verify-domtree", cl::Hidden,
                     cl::desc("Verify domtree after unrolling"),
@@ -497,10 +492,9 @@ static bool canHaveUnrollRemainder(const Loop *L) {
 // original loop iterations.
 //
 // There are often many sets of latch probabilities that can produce the
-// original total loop body frequency.  If there are many remaining conditional
-// latches and !UnrollUniformWeights, this function just quickly hacks a few of
-// their probabilities to restore the original total loop body frequency.
-// Otherwise, it tries harder to determine less arbitrary probabilities.
+// original total loop body frequency.  For now, this function computes uniform
+// probabilities when the number of remaining conditional latches is <= 2 and
+// does not handle other cases.
 static void fixProbContradiction(UnrollLoopOptions ULO,
                                  BranchProbability OriginalLoopProb,
                                  bool CompletelyUnroll,
@@ -546,8 +540,8 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
   // can adjust.  That should mean that the actual trip count is always exactly
   // the number of remaining unrolled iterations, and so OriginalLoopProb should
   // have yielded that trip count as the original loop body frequency.  Of
-  // course, OriginalLoopProb could be based on bad profile data, but there is
-  // nothing we can do about that here.
+  // course, OriginalLoopProb could be based on inaccurate profile data, but
+  // there is nothing we can do about that here.
   if (CondLatches.empty())
     return;
 
@@ -563,13 +557,6 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
   // FreqDesired is the frequency implied by the original loop probability.
   double FreqDesired = 1 / (1 - OriginalLoopProb.toDouble());
 
-  // Get the probability at CondLatches[I].
-  auto GetProb = [&](unsigned I) {
-    BranchInst *B = cast<BranchInst>(CondLatches[I]->getTerminator());
-    bool FirstTargetIsNext = B->getSuccessor(0) == CondLatchNexts[I];
-    return getBranchProbability(B, FirstTargetIsNext).toDouble();
-  };
-
   // Set the probability at CondLatches[I] to Prob.
   auto SetProb = [&](unsigned I, double Prob) {
     BranchInst *B = cast<BranchInst>(CondLatches[I]->getTerminator());
@@ -585,11 +572,10 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
       SetProb(I, Prob);
   };
 
-  // If UnrollUniformWeights or n <= 2, we choose the simplest probability model
-  // we can think of: every remaining conditional branch instruction has the
-  // same probability, Prob, of continuing to the next iteration.  This model
-  // has several helpful properties:
-  // - There is only one search parameter, Prob.
+  // If n <= 2, we choose the simplest probability model we can think of: every
+  // remaining conditional branch instruction has the same probability, Prob,
+  // of continuing to the next iteration.  This model has several helpful
+  // properties:
   // - We have no reason to think one latch branch's probability should be
   //   higher or lower than another, and so this model makes them all the same.
   //   In the worst cases, we thus avoid setting just some probabilities to 0 or
@@ -602,224 +588,51 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
   //
   //     FreqOne = Sum(i=0..n)(c_i * p^i)
   //
-  // - If the backedge has been eliminated:
-  //   - FreqOne is the total frequency of the original loop body in the
-  //     unrolled loop.
-  //   - If Prob == 1, the total frequency of the original loop body is exactly
-  //     the number of remaining loop iterations, as expected because every
-  //     remaining loop iteration always then executes.
-  // - If the backedge remains:
-  //   - Sum(i=0..inf)(FreqOne * p^(n*i)) = FreqOne / (1 - p^n) is the total
-  //     frequency of the original loop body in the unrolled loop, regardless of
-  //     whether the backedge is conditional or unconditional.
-  //   - As Prob approaches 1, the total frequency of the original loop body
-  //     approaches infinity, as expected because the loop approaches never
-  //     exiting.
+  // - If the backedge has been eliminated, FreqOne is the total frequency of
+  //   the original loop body in the unrolled loop.
+  // - If the backedge remains, Sum(i=0..inf)(FreqOne * p^(n*i)) =
+  //   FreqOne / (1 - p^n) is the total frequency of the original loop body in
+  //   the unrolled loop, regardless of whether the backedge is conditional or
+  //   unconditional.
   // - For n <= 2, we can use simple formulas to solve the above polynomial
-  //   equation exactly for p without performing a search.   For n == 2, we use
-  //   ComputeProbForQuadratic below.  For n == 1, we use ComputeProb below.
-  // - For n > 2, evaluating each point in the search space, using ComputeFreq
-  //   below, requires about as few instructions as we could hope for.  That is,
-  //   the probability is constant across the conditional branches, so the only
-  //   computation is across conditional branches and any backedge, as required
-  //   for any model for Prob.
-  // - Prob == 1 produces the maximum possible total frequency for the original
-  //   loop body, as described above.  Prob == 0 produces the minimum, 0.
-  //   Increasing or decreasing Prob monotonically increases or decreases the
-  //   frequency, respectively.  Thus, for every possible frequency, there
-  //   exists some Prob that can produce it, and we can easily use bisection to
-  //   search the problem space.
-
-  // When iterating for a solution, we stop early if we find probabilities
-  // that produce a Freq whose difference from FreqDesired is small
-  // (FreqPrec).  Otherwise, we expect to compute a solution at least that
-  // accurate (but surely far more accurate).
-  const double FreqPrec = 1e-6;
-
-  // Compute the new frequency produced by using Prob throughout CondLatches.
-  auto ComputeFreq = [&](double Prob) {
-    double ProbReaching = 1;        // p^0
-    double FreqOne = IterCounts[0]; // c_0*p^0
-    for (unsigned I = 0, E = CondLatches.size(); I < E; ++I) {
-      ProbReaching *= Prob;                        // p^(I+1)
-      FreqOne += IterCounts[I + 1] * ProbReaching; // c_(I+1)*p^(I+1)
-    }
-    double ProbReachingBackedge = CompletelyUnroll ? 0 : ProbReaching;
-    assert(FreqOne > 0 && "Expected at least one iteration before first latch");
-    if (ProbReachingBackedge == 1)
-      return std::numeric_limits<double>::infinity();
-    return FreqOne / (1 - ProbReachingBackedge);
+  //   equations exactly for p without performing a search.
+
+  // Compute the probability that, used at CondLaches[0] where
+  // CondLatches.size() == 1, gets as close as possible to FreqDesired.
+  auto ComputeProbForLinear = [&]() {
+    // The polynomial is linear (0 = A*p + B), so just solve it.
+    double A = IterCounts[1] + (CompletelyUnroll ? 0 : FreqDesired);
+    double B = IterCounts[0] - FreqDesired;
+    assert(A > 0 && "Expected iterations after last conditional latch");
+    double Prob = -B / A;
+    Prob = std::max(Prob, 0.);
+    Prob = std::min(Prob, 1.);
+    return Prob;
   };
 
   // Compute the probability that, used throughout CondLatches where
   // CondLatches.size() == 2, gets as close as possible to FreqDesired.
   auto ComputeProbForQuadratic = [&]() {
-    // The polynomial is quadratic, so just solve it.
+    // The polynomial is quadratic (0 = A*p^2 + B*p + C), so just solve it.
     double A = IterCounts[2] + (CompletelyUnroll ? 0 : FreqDesired);
     double B = IterCounts[1];
     double C = IterCounts[0] - FreqDesired;
     assert(A > 0 && "Expected iterations after last conditional latch");
     double Prob = (-B + sqrt(B * B - 4 * A * C)) / (2 * A);
-    // If it computes an invalid Prob, FreqDesired is impossibly low or high.
-    // Otherwise, Prob should produce nearly FreqDesired.
-    assert((Prob < 0 || Prob > 1 ||
-            fabs(ComputeFreq(Prob) - FreqDesired) < FreqPrec) &&
-           "Expected accurate frequency when quadratic case is possible");
     Prob = std::max(Prob, 0.);
     Prob = std::min(Prob, 1.);
     return Prob;
   };
 
-  // Compute the probability required at CondLatches[ComputeIdx] to get as close
-  // as possible to FreqDesired without replacing probabilities elsewhere in
-  // CondLatches.  Return {Prob, Freq} where 0 <= Prob <= 1 and Freq is the new
-  // frequency.
-  auto ComputeProb = [&](unsigned ComputeIdx) -> std::pair<double, double> {
-    assert(ComputeIdx < CondLatches.size());
-
-    // Accumulate the frequency from before ComputeIdx into FreqBeforeCompute,
-    // and accumulate the rest in Freq without yet multiplying the latter by any
-    // probability for ComputeIdx (i.e., treat it as 1 for now).
-    double ProbReaching = 1;     // p^0
-    double Freq = IterCounts[0]; // c_0*p^0
-    double FreqBeforeCompute;
-    for (unsigned I = 0, E = CondLatches.size(); I < E; ++I) {
-      // Get the branch probability for CondLatches[I].
-      double Prob;
-      if (I == ComputeIdx) {
-        FreqBeforeCompute = Freq;
-        Freq = 0;
-        Prob = 1;
-      } else {
-        Prob = GetProb(I);
-      }
-      ProbReaching *= Prob;                     // p^(I+1)
-      Freq += IterCounts[I + 1] * ProbReaching; // c_(I+1)*p^(I+1)
-    }
-
-    // Compute the required probability, and limit it to a valid probability (0
-    // <= p <= 1).  See the Freq formula below for how to derive the ProbCompute
-    // formula.
-    double ProbReachingBackedge = CompletelyUnroll ? 0 : ProbReaching;
-    double ProbComputeNumerator = FreqDesired - FreqBeforeCompute;
-    double ProbComputeDenominator = Freq + FreqDesired * ProbReachingBackedge;
-    double ProbCompute;
-    if (ProbComputeNumerator <= 0) {
-      // FreqBeforeCompute has already reached or surpassed FreqDesired, so add
-      // no more frequency.  It is possible that ProbComputeDenominator == 0
-      // here because some latch probability (maybe the original) was set to
-      // zero, so this check avoids setting ProbCompute=1 (in the else if below)
-      // and division by zero where the numerator <= 0 (in the else below).
-      ProbCompute = 0;
-    } else if (ProbComputeDenominator == 0) {
-      // Analytically, this case seems impossible.  It would occur if either:
-      // - Both Freq and FreqDesired are zero.  But the latter would cause
-      //   ProbComputeNumerator < 0, which we catch above, and FreqDesired
-      //   should always be >= 1 anyway.
-      // - There are no iterations after CondLatches[ComputeIdx], not even via
-      //   a backedge, so that both Freq and ProbReachingBackedge are zero.
-      //   But iterations should exist after even the last conditional latch.
-      // - Some latch probability (maybe the original) was set to zero so that
-      //   both Freq and ProbReachingBackedge are zero.  But that should not
-      //   have happened because, according to the above ProbComputeNumerator
-      //   check, we have not yet reached FreqDesired (which, if the original
-      //   latch probability is zero, is just 1 and thus always reached or
-      //   surpassed).
-      //
-      // Numerically, perhaps this case is possible.  We interpret it to mean we
-      // need more frequency (ProbComputeNumerator > 0) but have no way to get
-      // any (ProbComputeDenominator is analytically too small to distinguish it
-      // from 0 in floating point), suggesting infinite probability is needed,
-      // but 1 is the maximum valid probability and thus the best we can do.
-      //
-      // TODO: Cover this case in the test suite if you can.
-      ProbCompute = 1;
-    } else {
-      ProbCompute = ProbComputeNumerator / ProbComputeDenominator;
-      ProbCompute = std::max(ProbCompute, 0.);
-      ProbCompute = std::min(ProbCompute, 1.);
-    }
-
-    // Compute the resulting total frequency.
-    if (ProbReachingBackedge * ProbCompute == 1) {
-      // Analytically, this case seems impossible.  It requires that there is a
-      // backedge and that FreqDesired == infinity so that every conditional
-      // latch's probability had to be set to 1.  But FreqDesired == infinity
-      // means OriginalLoopProb.isOne(), which we guarded against earlier.
-      //
-      // Numerically, perhaps this case is possible.  We interpret it to mean
-      // that analytically the probability has to be so near 1 that, in floating
-      // point, the frequency is computed as infinite.
-      //
-      // TODO: Cover this case in the test suite if you can.
-      Freq = std::numeric_limits<double>::infinity();
-    } else {
-      assert(FreqBeforeCompute > 0 &&
-             "Expected at least one iteration before first latch");
-      // In this equation, if we replace the left-hand side with FreqDesired and
-      // then solve for ProbCompute, we get the ProbCompute formula above.
-      Freq = (FreqBeforeCompute + Freq * ProbCompute) /
-             (1 - ProbReachingBackedge * ProbCompute);
-    }
-    return {ProbCompute, Freq};
-  };
-
   // Determine and set branch weights.
-  //
-  // Prob < 0 and Prob > 1 cannot be represented as branch weights.  We might
-  // compute such a Prob if FreqDesired is impossible (e.g., due to bad profile
-  // data) for the maximum trip count we have determined when completely
-  // unrolling.  In that case, so just go with whichever is closest.
-  if (CondLatches.size() == 2) {
-    // The polynomial is quadratic, so just solve it.
+  if (CondLatches.size() == 1) {
+    SetAllProbs(ComputeProbForLinear());
+  } else if (CondLatches.size() == 2) {
     SetAllProbs(ComputeProbForQuadratic());
-  } else if (CondLatches.size() == 1 || !UnrollUniformWeights) {
-    // Either:
-    // - There's just one conditional latch, so just compute the probability
-    //   it requires to produce the original total frequency.
-    // - The polynomial is too complex for a simple formula and the quick and
-    //   dirty fix has been selected.  Adjust probabilities starting from the
-    //   first latch, which has the most influence on the total frequency, so
-    //   starting there should minimize the number of latches that have to be
-    //   visited.  We do have to iterate because the first latch alone might
-    //   not be enough.  For example, we might need to set all probabilities
-    //   to 1 if the frequency is the unroll factor.
-    for (unsigned I = 0; I != CondLatches.size(); ++I) {
-      double Prob, Freq;
-      std::tie(Prob, Freq) = ComputeProb(I);
-      SetProb(I, Prob);
-      if (fabs(Freq - FreqDesired) < FreqPrec)
-        break;
-    }
   } else {
-    // The polynomial is more complex, and uniform branch weights have been
-    // selected, so bisect.
-    double ProbMin, ProbMax, ProbPrev;
-    auto TryProb = [&](double Prob) {
-      ProbPrev = Prob;
-      double FreqDelta = ComputeFreq(Prob) - FreqDesired;
-      if (fabs(FreqDelta) < FreqPrec)
-        return 0;
-      if (FreqDelta < 0) {
-        ProbMin = Prob;
-        return -1;
-      }
-      ProbMax = Prob;
-      return 1;
-    };
-    // If Prob == 0 is too small and Prob == 1 is too large, bisect between
-    // them.  To place a hard upper limit on the search time, stop bisecting
-    // when Prob stops changing (ProbDelta) by much (ProbPrec).
-    if (TryProb(0.) < 0 && TryProb(1.) > 0) {
-      const double ProbPrec = 1e-12;
-      double Prob, ProbDelta;
-      do {
-        Prob = (ProbMin + ProbMax) / 2;
-        ProbDelta = Prob - ProbPrev;
-      } while (TryProb(Prob) != 0 && fabs(ProbDelta) > ProbPrec);
-    }
-    SetAllProbs(ProbPrev);
+    // FIXME: Handle CondLatches.size() > 2.
   }
+
   // FIXME: We have not considered non-latch loop exits:
   // - Their original probabilities are not considered in our calculation of
   //   FreqDesired.
diff --git a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
index 94d259b20bf84..3d87ee185b554 100644
--- a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
+++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
@@ -24,22 +24,9 @@
 ;     @f) that appear within each unrolled iteration.
 ; - Branch weight metadata
 ;   - Checking frequencies already checks whether the branch weights have the
-;     expected effect, but we also want to check the following.
-;   - We get uniform probabilities/weights (same !prof) across the unrolled
-;     iteration latches if either:
-;     - The number of unrolled iterations <= the original loop body frequency,
-;       and then probabilities are all 1 to *try* to reach that frequency.
-;     - The original loop body frequency is 1, and then probabilities are all 0
-;       because only the first iteration is expected to execute.
-;     - The number of remaining conditional latches is <= 2, either because the
-;       number of unrolled iterations is <= 3 or because enough of the unrolled
-;       iterations' latches become unconditional.  Either way, the
-;       implementation computes uniform branch weights by solving a linear or
-;       quadratic equation.
-;     - -unroll-uniform-weights.
-;   - Otherwise, the earliest branch weights (starting with !prof !0) are
-;     adjusted as needed to produce the original loop body frequency, and the
-;     rest are left as they were in the original loop.
+;     expected effect, but we also want to check that we get uniform
+;     probabilities/weights (same !prof) across the unrolled iteration latches
+;     when expected.
 ; - llvm.loop.estimated_trip_count:
 ;   - There should be none because loops are completely unrolled.
 
@@ -71,14 +58,12 @@
 ; Check 1 max iteration:
 ; - Unroll count of >=1 should always produce complete unrolling.
 ; - That produces 0 unrolled iteration latches, so there are no branch weights
-;   to compute.  Thus, -unroll-uniform-weights has no effect.
+;   to compute.
 ;
 ; Original loop body frequency is 2 (loop weight 1), which is impossibly high.
 ;
 ;   RUN: sed -e s/@MAX@/1/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;   RUN: %{bf-fc} ORIG1210
-;   RUN: %{ur-bf} -unroll-count=1 -unroll-uniform-weights | %{fc} UR1210
-;   RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR1210
 ;   RUN: %{ur-bf} -unroll-count=1 | %{fc} UR1210
 ;   RUN: %{ur-bf} -unroll-count=2 | %{fc} UR1210
 ;
@@ -92,8 +77,6 @@
 ;
 ;   RUN: sed -e s/@MAX@/1/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;   RUN: %{bf-fc} ORIG1110
-;   RUN: %{ur-bf} -unroll-count=1 -unroll-uniform-weights | %{fc} UR1110
-;   RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR1110
 ;   RUN: %{ur-bf} -unroll-count=1 | %{fc} UR1110
 ;   RUN: %{ur-bf} -unroll-count=2 | %{fc} UR1110
 ;
@@ -107,8 +90,7 @@
 ; Check 2 max iterations:
 ; - Unroll count of >=2 should always produce complete unrolling.
 ; - That produces <=1 unrolled iteration latch, so the implementation can
-;   compute uniform weights by solving, at worst, a linear equation.  Thus,
-;   -unroll-uniform-weights has no effect.
+;   compute uniform weights by solving, at worst, a linear equation.
 ;
 ; Original loop body frequency is 3 (loop weight 2), which is impossibly high.
 ;
@@ -117,8 +99,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2310
-;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2310
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2310
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2310
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2310
 ;
@@ -140,8 +120,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/2/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2320
-;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2320
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2320
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2320
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2320
 ;
@@ -162,8 +140,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2210
-;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2210
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2210
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2210
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2210
 ;
@@ -183,8 +159,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/1/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2220
-;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2220
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2220
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2220
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2220
 ;
@@ -205,8 +179,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2110
-;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2110
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2110
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2110
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2110
 ;
@@ -226,8 +198,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/2/ -e s/@W@/0/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG2120
-;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2120
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2120
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2120
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2120
 ;
@@ -245,8 +215,7 @@
 ; Check 3 max iterations:
 ; - Unroll count of >=3 should always produce complete unrolling.
 ; - That produces <=2 unrolled iteration latches, so the implementation can
-;   compute uniform weights solving, at worst, a quadratic equation.  Thus,
-;   -unroll-uniform-weights has no effect.
+;   compute uniform weights solving, at worst, a quadratic equation.
 ;
 ; Original loop body frequency is 4 (loop weight 3), which is impossibly high.
 ;
@@ -255,8 +224,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3410
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3410
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3410
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3410
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3410
 ;
@@ -281,8 +248,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3430
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3430
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3430
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3430
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3430
 ;
@@ -304,8 +269,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG343x
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR343x
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR343x
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR343x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR343x
 ;
@@ -332,8 +295,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3310
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3310
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3310
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3310
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3310
 ;
@@ -356,8 +317,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3330
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3330
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3330
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3330
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3330
 ;
@@ -379,8 +338,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG333x
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR333x
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR333x
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR333x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR333x
 ;
@@ -406,8 +363,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3210
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3210
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3210
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3210
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3210
 ;
@@ -430,8 +385,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3230
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3230
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3230
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3230
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3230
 ;
@@ -453,8 +406,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG323x
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR323x
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR323x
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR323x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR323x
 ;
@@ -479,8 +430,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3110
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3110
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3110
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3110
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3110
 ;
@@ -503,8 +452,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG3130
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3130
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3130
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3130
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3130
 ;
@@ -526,8 +473,6 @@
 ;
 ;     RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
 ;     RUN: %{bf-fc} ORIG313x
-;     RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR313x
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR313x
 ;     RUN: %{ur-bf} -unroll-count=3 | %{fc} UR313x
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR313x
 ;
@@ -546,557 +491,6 @@
 ;     UR313x:     br label %do.end
 ;     UR313x:     !0 = !{!"branch_weights", i32 -2147483648, i32 0}
 
-; ------------------------------------------------------------------------------
-; Check 4 max iterations:
-; - Unroll count of >=4 should always produce complete unrolling.
-; - That produces <=3 unrolled iteration latches.  3 is the lowest number where
-;   the implementation cannot compute uniform weights using a simple formula.
-;   Thus, this is our first case where -unroll-uniform-weights matters.
-;
-; Original loop body frequency is 5 (loop weight 4), which is impossibly high.
-;
-;   First use a variable iteration count so that all non-final unrolled
-;   iterations' latches remain conditional.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG4510
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4510
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4510
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4510
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4510
-;
-;     The sum of the new do.body* cannot reach the old do.body, which is
-;     impossibly high.
-;     ORIG4510: - do.body: float = 5.0,
-;     UR4510:   - do.body: float = 1.0,
-;     UR4510:   - do.body.1: float = 1.0,
-;     UR4510:   - do.body.2: float = 1.0,
-;     UR4510:   - do.body.3: float = 1.0,
-;
-;     The probabilities are maximized to try to reach the original frequency.
-;     UR4510: call void @f
-;     UR4510: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;     UR4510: call void @f
-;     UR4510: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
-;     UR4510: call void @f
-;     UR4510: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0
-;     UR4510: call void @f
-;     UR4510: br label %do.end
-;     UR4510: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
-;
-;   Now use a constant iteration count so that all non-final unrolled
-;   iterations' latches unconditionally continue.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG4540
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4540
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4540
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4540
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4540
-;
-;     The new do.body contains 4 of the original loop's iterations, so multiply
-;     it by 4, which is less than the old do.body, which is impossibly high.
-;     ORIG4540: - do.body: float = 5.0,
-;     UR4540:   - do.body: float = 1.0,
-;
-;     UR4540:     call void @f
-;     UR4540-NOT: br
-;     UR4540:     call void @f
-;     UR4540-NOT: br
-;     UR4540:     call void @f
-;     UR4540-NOT: br
-;     UR4540:     call void @f
-;     UR4540:     ret void
-;
-;   Use a constant iteration count but now the loop upper bound computation can
-;   overflow.  When it does, the loop induction variable is greater than it
-;   immediately, so the initial unrolled iteration's latch remains conditional.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG454x
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR454x
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR454x
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR454x
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR454x
-;
-;     The new do.body.1 contains 3 of the original loop's iterations, so
-;     multiply it by 3, and add the new do.body, but that sum is less than the
-;     old do.body, which is impossibly high.
-;     ORIG454x: - do.body: float = 5.0,
-;     UR454x:   - do.body: float = 1.0,
-;     UR454x:   - do.body.1: float = 1.0,
-;
-;     The sole probability is maximized to try to reach the original frequency.
-;     UR454x:     call void @f
-;     UR454x:     br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;     UR454x:     call void @f
-;     UR454x-NOT: br
-;     UR454x:     call void @f
-;     UR454x-NOT: br
-;     UR454x:     call void @f
-;     UR454x:     br label %do.end
-;     UR454x:     !0 = !{!"branch_weights", i32 0, i32 -2147483648}
-;
-; Original loop body frequency is 4 (loop weight 3).
-;
-;   First use a variable iteration count so that all non-final unrolled
-;   iterations' latches remain conditional.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG4410
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4410
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4410
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4410
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4410
-;
-;     The sum of the new do.body* is the old do.body.
-;     ORIG4410: - do.body: float = 4.0,
-;     UR4410:   - do.body: float = 1.0,
-;     UR4410:   - do.body.1: float = 1.0,
-;     UR4410:   - do.body.2: float = 1.0,
-;     UR4410:   - do.body.3: float = 1.0,
-;
-;     UR4410: call void @f
-;     UR4410: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;     UR4410: call void @f
-;     UR4410: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
-;     UR4410: call void @f
-;     UR4410: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0
-;     UR4410: call void @f
-;     UR4410: br label %do.end
-;     UR4410: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
-;
-;   Now use a constant iteration count so that all non-final unrolled
-;   iterations' latches unconditionally continue.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG4440
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4440
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4440
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4440
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4440
-;
-;     The new do.body contains 4 of the original loop's iterations, so multiply
-;     it by 4 to get the old do.body.
-;     ORIG4440: - do.body: float = 4.0,
-;     UR4440:   - do.body: float = 1.0,
-;
-;     UR4440:     call void @f
-;     UR4440-NOT: br
-;     UR4440:     call void @f
-;     UR4440-NOT: br
-;     UR4440:     call void @f
-;     UR4440-NOT: br
-;     UR4440:     call void @f
-;     UR4440:     ret void
-;
-;   Use a constant iteration count but now the loop upper bound computation can
-;   overflow.  When it does, the loop induction variable is greater than it
-;   immediately, so the initial unrolled iteration's latch remains conditional.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG444x
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR444x
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR444x
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR444x
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR444x
-;
-;     The new do.body.1 contains 3 of the original loop's iterations, so
-;     multiply it by 3, and add the new do.body to get the old do.body.
-;     ORIG444x: - do.body: float = 4.0,
-;     UR444x:   - do.body: float = 1.0,
-;     UR444x:   - do.body.1: float = 1.0,
-;
-;     UR444x:     call void @f
-;     UR444x:     br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;     UR444x:     call void @f
-;     UR444x-NOT: br
-;     UR444x:     call void @f
-;     UR444x-NOT: br
-;     UR444x:     call void @f
-;     UR444x:     br label %do.end
-;     UR444x:     !0 = !{!"branch_weights", i32 0, i32 -2147483648}
-;
-; Original loop body frequency is 3 (loop weight 2).  This is our first case
-; where the new probabilities vary (unless -unroll-uniform-weights).
-;
-;   First use a variable iteration count so that all non-final unrolled
-;   iterations' latches remain conditional.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG4310
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4310,UNIF4310
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4310,UNIF4310
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4310,FAST4310
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4310,FAST4310
-;
-;     The sum of the new do.body* is always approximately the old do.body.
-;     ORIG4310: - do.body: float = 3.0,
-;     UNIF4310: - do.body: float = 1.0,
-;     UNIF4310: - do.body.1: float = 0.81054,
-;     UNIF4310: - do.body.2: float = 0.65697,
-;     UNIF4310: - do.body.3: float = 0.5325,
-;     FAST4310: - do.body: float = 1.0,
-;     FAST4310: - do.body.1: float = 0.94737,
-;     FAST4310: - do.body.2: float = 0.63158,
-;     FAST4310: - do.body.3: float = 0.42105,
-;
-;     UR4310:        call void @f
-;     UR4310:        br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;     UR4310:        call void @f
-;     UR4310:        br i1 %{{.*}}, label %do.end, label %do.body.2,
-;     UNIF4310-SAME:   !prof !0
-;     FAST4310-SAME:   !prof !1
-;     UR4310:        call void @f
-;     UR4310:        br i1 %{{.*}}, label %do.end, label %do.body.3,
-;     UNIF4310-SAME:   !prof !0
-;     FAST4310-SAME:   !prof !1
-;     UR4310:        call void @f
-;     UR4310:        br label %do.end
-;     UNIF4310:      !0 = !{!"branch_weights", i32 406871040, i32 1740612608}
-;     FAST4310:      !0 = !{!"branch_weights", i32 113025456, i32 2034458192}
-;     FAST4310:      !1 = !{!"branch_weights", i32 1, i32 2}
-;
-;   Now use a constant iteration count so that all non-final unrolled
-;   iterations' latches unconditionally continue.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG4340
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4340
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4340
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4340
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4340
-;
-;     The new do.body contains 4 of the original loop's iterations, so multiply
-;     it by 4, which is greater than the old do.body, which is impossibly low.
-;     ORIG4340: - do.body: float = 3.0,
-;     UR4340:   - do.body: float = 1.0,
-;
-;     UR4340:     call void @f
-;     UR4340-NOT: br
-;     UR4340:     call void @f
-;     UR4340-NOT: br
-;     UR4340:     call void @f
-;     UR4340-NOT: br
-;     UR4340:     call void @f
-;     UR4340:     ret void
-;
-;   Use a constant iteration count but now the loop upper bound computation can
-;   overflow.  When it does, the loop induction variable is greater than it
-;   immediately, so the initial unrolled iteration's latch remains conditional.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG434x
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR434x
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR434x
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR434x
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR434x
-;
-;     The new do.body.1 contains 3 of the original loop's iterations, so
-;     multiply it by 3, and add the new do.body to get the old do.body.
-;     ORIG434x: - do.body: float = 3.0,
-;     UR434x:   - do.body: float = 1.0,
-;     UR434x:   - do.body.1: float = 0.66667,
-;
-;     UR434x:     call void @f
-;     UR434x:     br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;     UR434x:     call void @f
-;     UR434x-NOT: br
-;     UR434x:     call void @f
-;     UR434x-NOT: br
-;     UR434x:     call void @f
-;     UR434x:     br label %do.end
-;     UR434x:     !0 = !{!"branch_weights", i32 715827884, i32 1431655764}
-;
-; Original loop body frequency is 2 (loop weight 1).
-;
-;   First use a variable iteration count so that all non-final unrolled
-;   iterations' latches remain conditional.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG4210
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4210,UNIF4210
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4210,UNIF4210
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4210,FAST4210
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4210,FAST4210
-;
-;     The sum of the new do.body* is always the old do.body.
-;     ORIG4210: - do.body: float = 2.0,
-;     UNIF4210: - do.body: float = 1.0,
-;     UNIF4210: - do.body.1: float = 0.54369,
-;     UNIF4210: - do.body.2: float = 0.2956,
-;     UNIF4210: - do.body.3: float = 0.16071,
-;     FAST4210: - do.body: float = 1.0,
-;     FAST4210: - do.body.1: float = 0.57143,
-;     FAST4210: - do.body.2: float = 0.28571,
-;     FAST4210: - do.body.3: float = 0.14286,
-;
-;     UR4210:        call void @f
-;     UR4210:        br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;     UR4210:        call void @f
-;     UR4210:        br i1 %{{.*}}, label %do.end, label %do.body.2,
-;     UNIF4210-SAME:   !prof !0
-;     FAST4210-SAME:   !prof !1
-;     UR4210:        call void @f
-;     UR4210:        br i1 %{{.*}}, label %do.end, label %do.body.3,
-;     UNIF4210-SAME:   !prof !0
-;     FAST4210-SAME:   !prof !1
-;     UR4210:        call void @f
-;     UR4210:        br label %do.end
-;     UNIF4210:      !0 = !{!"branch_weights", i32 979920896, i32 1167562752}
-;     FAST4210:      !0 = !{!"branch_weights", i32 920350135, i32 1227133513}
-;     FAST4210:      !1 = !{!"branch_weights", i32 1, i32 1}
-;
-;   Now use a constant iteration count so that all non-final unrolled
-;   iterations' latches unconditionally continue.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG4240
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4240
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4240
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4240
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4240
-;
-;     The new do.body contains 4 of the original loop's iterations, so multiply
-;     it by 4, which is greater than the old do.body, which is impossibly low.
-;     ORIG4240: - do.body: float = 2.0,
-;     UR4240:   - do.body: float = 1.0,
-;
-;     UR4240:     call void @f
-;     UR4240-NOT: br
-;     UR4240:     call void @f
-;     UR4240-NOT: br
-;     UR4240:     call void @f
-;     UR4240-NOT: br
-;     UR4240:     call void @f
-;     UR4240:     ret void
-;
-;   Use a constant iteration count but now the loop upper bound computation can
-;   overflow.  When it does, the loop induction variable is greater than it
-;   immediately, so the initial unrolled iteration's latch remains conditional.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG424x
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR424x
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR424x
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR424x
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR424x
-;
-;     The new do.body.1 contains 3 of the original loop's iterations, so
-;     multiply it by 3, and add the new do.body to get the old do.body.
-;     ORIG424x: - do.body: float = 2.0,
-;     UR424x:   - do.body: float = 1.0,
-;     UR424x:   - do.body.1: float = 0.33333,
-;
-;     UR424x:     call void @f
-;     UR424x:     br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;     UR424x:     call void @f
-;     UR424x-NOT: br
-;     UR424x:     call void @f
-;     UR424x-NOT: br
-;     UR424x:     call void @f
-;     UR424x:     br label %do.end
-;     UR424x:     !0 = !{!"branch_weights", i32 1431655765, i32 715827883}
-;
-; Original loop body frequency is 1 (loop weight 0).
-;
-;   First use a variable iteration count so that all non-final unrolled
-;   iterations' latches remain conditional.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG4110
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4110
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4110
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4110
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4110
-;
-;     The sum of the new do.body* is approximately the old do.body.
-;     ORIG4110: - do.body: float = 1.0,
-;     UR4110:   - do.body: float = 1.0,
-;     UR4110:   - do.body.1: float = 0.0{{(0000[0-9]*)?}},
-;     UR4110:   - do.body.2: float = 0.0{{(0000[0-9]*)?}},
-;     UR4110:   - do.body.3: float = 0.0{{(0000[0-9]*)?}},
-;
-;     UR4110: call void @f
-;     UR4110: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;     UR4110: call void @f
-;     UR4110: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
-;     UR4110: call void @f
-;     UR4110: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0
-;     UR4110: call void @f
-;     UR4110: br label %do.end
-;     UR4110: !0 = !{!"branch_weights", i32 1, i32 0}
-;
-;   Now use a constant iteration count so that all non-final unrolled
-;   iterations' latches unconditionally continue.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG4140
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4140
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4140
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4140
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4140
-;
-;     The new do.body contains 4 of the original loop's iterations, so multiply
-;     it by 4, which is greater than the old do.body, which is impossibly low.
-;     ORIG4140: - do.body: float = 1.0,
-;     UR4140:   - do.body: float = 1.0,
-;
-;     UR4140:     call void @f
-;     UR4140-NOT: br
-;     UR4140:     call void @f
-;     UR4140-NOT: br
-;     UR4140:     call void @f
-;     UR4140-NOT: br
-;     UR4140:     call void @f
-;     UR4140:     ret void
-;
-;   Use a constant iteration count but now the loop upper bound computation can
-;   overflow.  When it does, the loop induction variable is greater than it
-;   immediately, so the initial unrolled iteration's latch remains conditional.
-;
-;     RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
-;     RUN: %{bf-fc} ORIG414x
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR414x
-;     RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR414x
-;     RUN: %{ur-bf} -unroll-count=4 | %{fc} UR414x
-;     RUN: %{ur-bf} -unroll-count=5 | %{fc} UR414x
-;
-;     The new do.body.1 contains 3 of the original loop's iterations, so
-;     multiply it by 3, and add the new do.body to get approximately the old
-;     do.body.
-;     ORIG414x: - do.body: float = 1.0,
-;     UR414x:   - do.body: float = 1.0,
-;     UR414x:   - do.body.1: float = 0.0{{(0000[0-9]*)?}},
-;
-;     UR414x:     call void @f
-;     UR414x:     br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;     UR414x:     call void @f
-;     UR414x-NOT: br
-;     UR414x:     call void @f
-;     UR414x-NOT: br
-;     UR414x:     call void @f
-;     UR414x:     br label %do.end
-;     UR414x:     !0 = !{!"branch_weights", i32 -2147483648, i32 0}
-
-; ------------------------------------------------------------------------------
-; Check 5 max iterations:
-; - Unroll count of >=5 should always produce complete unrolling.
-; - That produces <=4 unrolled iteration latches.  When at least 3 remain
-;   conditional, the implementation cannot compute uniform weights using a
-;   simple formula, so -unroll-uniform-weights matters.
-;
-; Original loop body frequency is 5 (loop weight 4).
-;
-;   RUN: sed -e s/@MAX@/5/ -e s/@W@/4/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
-;   RUN: %{bf-fc} ORIG5510
-;   RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR5510,UNIF5510
-;   RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} UR5510,UNIF5510
-;   RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5510,FAST5510
-;   RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5510,FAST5510
-;
-;   The sum of the new do.body* is the old do.body.
-;   ORIG5510: - do.body: float = 5.0,
-;   UR5510:   - do.body: float = 1.0,
-;   UR5510:   - do.body.1: float = 1.0,
-;   UR5510:   - do.body.2: float = 1.0,
-;   UR5510:   - do.body.3: float = 1.0,
-;   UR5510:   - do.body.4: float = 1.0,
-;
-;   All continue probabilities are approximately 1, but somehow there is less
-;   precision in the calculation of the last case.
-;   UR5510:        call void @f
-;   UR5510:        br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;   UR5510:        call void @f
-;   UR5510:        br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
-;   UR5510:        call void @f
-;   UR5510:        br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0
-;   UR5510:        call void @f
-;   UR5510:        br i1 %{{.*}}, label %do.end, label %do.body.4,
-;   UNIF5510-SAME:   !prof !0
-;   FAST5510-SAME:   !prof !1
-;   UR5510:        call void @f
-;   UR5510:        br label %do.end
-;   UNIF5510: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
-;   FAST5510: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
-;   FAST5510: !1 = !{!"branch_weights", i32 10, i32 2147483638}
-;
-; Original loop body frequency is 4 (loop weight 3).
-;
-;   RUN: sed -e s/@MAX@/5/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
-;   RUN: %{bf-fc} ORIG5410
-;   RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR5410,UNIF5410
-;   RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} UR5410,UNIF5410
-;   RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5410,FAST5410
-;   RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5410,FAST5410
-;
-;   The sum of the new do.body* is always the old do.body.
-;   ORIG5410: - do.body: float = 4.0,
-;   UNIF5410: - do.body: float = 1.0,
-;   UNIF5410: - do.body.1: float = 0.88818,
-;   UNIF5410: - do.body.2: float = 0.78886,
-;   UNIF5410: - do.body.3: float = 0.70065,
-;   UNIF5410: - do.body.4: float = 0.62231,
-;   FAST5410: - do.body: float = 1.0,
-;   FAST5410: - do.body.1: float = 1.0,
-;   FAST5410: - do.body.2: float = 0.86486,
-;   FAST5410: - do.body.3: float = 0.64865,
-;   FAST5410: - do.body.4: float = 0.48649,
-;
-;   This is our first case where, when not using -unroll-uniform-weights, the
-;   implementation must adjust multiple probabilities to something other than
-;   the original latch probability but does not just set all probabilities to
-;   the limit of 1 or 0.
-;   UR5410:        call void @f
-;   UR5410:        br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;   UR5410:        call void @f
-;   UR5410:        br i1 %{{.*}}, label %do.end, label %do.body.2,
-;   UNIF5410-SAME:   !prof !0
-;   FAST5410-SAME:   !prof !1
-;   UR5410:        call void @f
-;   UR5410:        br i1 %{{.*}}, label %do.end, label %do.body.3,
-;   UNIF5410-SAME:   !prof !0
-;   FAST5410-SAME:   !prof !2
-;   UR5410:        call void @f
-;   UR5410:        br i1 %{{.*}}, label %do.end, label %do.body.4,
-;   UNIF5410-SAME:   !prof !0
-;   FAST5410-SAME:   !prof !2
-;   UR5410:        call void @f
-;   UR5410:        br label %do.end
-;   UNIF5410: !0 = !{!"branch_weights", i32 240132096, i32 1907351552}
-;   FAST5410: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
-;   FAST5410: !1 = !{!"branch_weights", i32 290200493, i32 1857283155}
-;   FAST5410: !2 = !{!"branch_weights", i32 1, i32 3}
-;
-; Original loop body frequency is 1 (loop weight 0).
-;
-;   RUN: sed -e s/@MAX@/5/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
-;   RUN: %{bf-fc} ORIG5110
-;   RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR5110
-;   RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} UR5110
-;   RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5110
-;   RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5110
-;
-;   The sum of the new do.body* is approximately the old do.body.
-;   ORIG5110: - do.body: float = 1.0,
-;   UR5110:   - do.body: float = 1.0,
-;   UR5110:   - do.body.1: float = 0.0{{(0000[0-9]*)?}},
-;   UR5110:   - do.body.2: float = 0.0{{(0000[0-9]*)?}},
-;   UR5110:   - do.body.3: float = 0.0{{(0000[0-9]*)?}},
-;   UR5110:   - do.body.4: float = 0.0{{(0000[0-9]*)?}},
-;
-;   UR5110: call void @f
-;   UR5110: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-;   UR5110: call void @f
-;   UR5110: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
-;   UR5110: call void @f
-;   UR5110: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0
-;   UR5110: call void @f
-;   UR5110: br i1 %{{.*}}, label %do.end, label %do.body.4, !prof !0
-;   UR5110: call void @f
-;   UR5110: br label %do.end
-;   UR5110: !0 = !{!"branch_weights", i32 1, i32 0}
-
 declare void @f(i32)
 
 define void @test(i32 %x, i32 %n) {
diff --git a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-epilog.ll b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-epilog.ll
index c8ed8ef82a55f..09ecaebcf1f45 100644
--- a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-epilog.ll
+++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-epilog.ll
@@ -28,23 +28,14 @@
 ;     loop's estimated trip count or the branch weights on the unrolled loop
 ;     guard, unrolled loop latch, or epilogue loop guard.
 ;   - We get uniform probabilities/weights (same !prof) across the epilogue
-;     iteration latches if either:
-;     - Every iteration's latch remains conditional, so their original
-;       probabilities are not contradicted.
-;     - The number of remaining conditional latches is <= 2, so the
-;       implementation computes uniform branch weights by solving a linear or
-;       quadratic equation.
-;   - Otherwise, the earliest branch weights (starting with !prof !0) are
-;     adjusted as needed to produce the original loop body frequency, and the
-;     rest are left as they would be in the epilogue loop if it were not
-;     unrolled.
+;     iteration latches when expected.
 ; - llvm.loop.estimated_trip_count
-;   - For the unrolled and epilogue loops, must be the number of iterations
+;   - For the unrolled and epilogue loops, it must be the number of iterations
 ;     required for the original loop body to reach its original estimated trip
 ;     count, which is its original frequency, 11, because there is no prior
 ;     llvm.loop.estimated_trip_count.
-;   - Must not be blindly duplicated between the unrolled and epilogue loops.
-;   - Must not be blindly computed from any new latch branch weights.
+;   - It must not be blindly duplicated between the unrolled and epilogue loops.
+;   - It must not be blindly computed from any new latch branch weights.
 
 ; ------------------------------------------------------------------------------
 ; Verify that the test code produces the original loop body frequency we expect.
@@ -147,74 +138,6 @@
 ; - It has no llvm.loop.estimated_trip_count.
 ; UR4-EUR: !6 = !{!"branch_weights", i32 1265493781, i32 881989867}
 
-; ------------------------------------------------------------------------------
-; Check -unroll-count=8.
-;
-; RUN: %{ur-bf} -unroll-count=8 | %{fc} UR8,UR8-ELP
-; RUN: %{ur-bf} -unroll-count=8 -unroll-remainder | \
-; RUN:   %{fc} UR8,UR8-EUR
-;
-; Multiply do.body by 8 and add do.body.epil* for either ELP or EUR to get the
-; original loop body frequency, 11.
-; UR8:     - do.body: float = 0.96188,
-; UR8-ELP: - do.body.epil: float = 3.3049,
-; UR8-EUR: - do.body.epil: float = 0.91256,
-; UR8-EUR: - do.body.epil.1: float = 0.7716,
-; UR8-EUR: - do.body.epil.2: float = 0.55854,
-; UR8-EUR: - do.body.epil.3: float = 0.40432,
-; UR8-EUR: - do.body.epil.4: float = 0.29268,
-; UR8-EUR: - do.body.epil.5: float = 0.21186,
-; UR8-EUR: - do.body.epil.6: float = 0.15336,
-;
-; Unrolled loop guard, body, and latch.
-; UR8: br i1 %{{.*}}, label %do.body.epil.preheader, label %entry.new, !prof !0
-; UR8-COUNT-8: call void @f
-; UR8: br i1 %{{.*}}, label %do.end.unr-lcssa, label %do.body, !prof !1, !llvm.loop !2
-;
-; Epilogue guard.
-; UR8: br i1 %{{.*}}, label %do.body.epil.preheader, label %do.end, !prof !5
-;
-; Non-unrolled epilogue loop.
-; UR8-ELP: call void @f
-; UR8-ELP: br i1 %{{.*}}, label %do.body.epil, label %do.end.epilog-lcssa, !prof !6, !llvm.loop !7
-;
-; Completely unrolled epilogue loop.
-; UR8-EUR: call void @f
-; UR8-EUR: br i1 %{{.*}}, label %do.body.epil.1, label %do.end.epilog-lcssa, !prof !6
-; UR8-EUR: call void @f
-; UR8-EUR: br i1 %{{.*}}, label %do.body.epil.2, label %do.end.epilog-lcssa, !prof !7
-; UR8-EUR: call void @f
-; UR8-EUR: br i1 %{{.*}}, label %do.body.epil.3, label %do.end.epilog-lcssa, !prof !7
-; UR8-EUR: call void @f
-; UR8-EUR: br i1 %{{.*}}, label %do.body.epil.4, label %do.end.epilog-lcssa, !prof !7
-; UR8-EUR: call void @f
-; UR8-EUR: br i1 %{{.*}}, label %do.body.epil.5, label %do.end.epilog-lcssa, !prof !7
-; UR8-EUR: call void @f
-; UR8-EUR: br i1 %{{.*}}, label %do.body.epil.6, label %do.end.epilog-lcssa, !prof !7
-; UR8-EUR: call void @f
-;
-; Unrolled loop metadata.
-; UR8: !0 = !{!"branch_weights", i32 1045484980, i32 1101998668}
-; UR8: !1 = !{!"branch_weights", i32 1145666677, i32 1001816971}
-; UR8: !2 = distinct !{!2, !3, !4}
-; UR8: !3 = !{!"llvm.loop.estimated_trip_count", i32 1}
-; UR8: !4 = !{!"llvm.loop.unroll.disable"}
-; UR8: !5 = !{!"branch_weights", i32 1781544591, i32 365939057}
-;
-; Non-unrolled epilogue loop metadata.
-; UR8-ELP: !6 = !{!"branch_weights", i32 1554520665, i32 592962983}
-; UR8-ELP: !7 = distinct !{!7, !8, !4}
-; UR8-ELP: !8 = !{!"llvm.loop.estimated_trip_count", i32 3}
-;
-; Completely unrolled epilogue loop metadata.  Because it loses its backedge:
-; - The remaining conditional latches' branch weights must be adjusted relative
-;   to the non-unrolled case.  There are many, so the implementation does not
-;   compute uniform branch weights.  Adjusting the first is sufficient, so the
-;   second is the same as the non-unrolled epilogue branch weights.
-; - It has no llvm.loop.estimated_trip_count.
-; UR8-EUR: !6 = !{!"branch_weights", i32 1815773828, i32 331709820}
-; UR8-EUR: !7 = !{!"branch_weights", i32 1554520665, i32 592962983}
-
 ; ------------------------------------------------------------------------------
 ; Check -unroll-count=10.
 ;
diff --git a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll
index f6dcdc49a4407..09b2097d13582 100644
--- a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll
+++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll
@@ -23,24 +23,13 @@
 ;     @f) that appear within each unrolled iteration.
 ; - Branch weight metadata
 ;   - Checking frequencies already checks whether the branch weights have the
-;     expected effect, but we also want to check the following.
-;   - We get uniform probabilities/weights (same !prof) across the unrolled
-;     iteration latches if either:
-;     - Every iteration's latch remains conditional, so their original
-;       probabilities are not contradicted.
-;     - The original loop body frequency is 1, and then probabilities are all 0
-;       because only the first iteration is expected to execute.
-;     - The number of remaining conditional latches is <= 2, so the
-;       implementation computes uniform branch weights by solving a linear or
-;       quadratic equation.
-;     - -unroll-uniform-weights.
-;   - Otherwise, the earliest branch weights (starting with !prof !0) are
-;     adjusted as needed to produce the original loop body frequency, and the
-;     rest are left as they were in the original loop.
+;     expected effect, but we also want to check that we get uniform
+;     probabilities/weights (same !prof) across the unrolled iteration latches
+;     when expected.
 ; - llvm.loop.estimated_trip_count
-;   - Must be the number of iterations of the unrolled loop required for the
+;   - It must be the number of iterations of the unrolled loop required for the
 ;     original loop body to reach its original frequency.
-;   - Must not be blindly computed from any new latch branch weights.
+;   - It must not be blindly computed from any new latch branch weights.
 
 ; ------------------------------------------------------------------------------
 ; Define LIT substitutions.
@@ -78,7 +67,6 @@
 ; their original probabilities are not contradicted.  That is, the original loop
 ; latch's branch weights remain on all unrolled iterations' latches.
 ;
-;   RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} MULT3
 ;   RUN: %{ur-bf} -unroll-count=3 | %{fc} MULT3
 ;
 ;   Sums to approximately the original loop body frequency, 10.
@@ -103,9 +91,7 @@
 ;
 ;   -unroll-count=2, so there is 1 remaining conditional latch, so the
 ;   implementation can compute uniform weights by solving a linear equation.
-;   Thus, -unroll-uniform-weights has no effect.
 ;
-;     RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} MULT2
 ;     RUN: %{ur-bf} -unroll-count=2 | %{fc} MULT2
 ;
 ;     Multiply by 2 to get the original loop body frequency, 10.
@@ -125,9 +111,7 @@
 ;
 ;   -unroll-count=4, so there are 2 remaining conditional latches, so the
 ;   implementation can compute uniform weights using the quadratic formula.
-;   Thus, -unroll-uniform-weights has no effect.
 ;
-;     RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} MULT4
 ;     RUN: %{ur-bf} -unroll-count=4 | %{fc} MULT4
 ;
 ;     Multiply by 2 and sum to get the original loop body frequency, 10.
@@ -149,56 +133,6 @@
 ;     MULT4: !1 = distinct !{!1, !2, !3}
 ;     MULT4: !2 = !{!"llvm.loop.estimated_trip_count", i32 3}
 ;     MULT4: !3 = !{!"llvm.loop.unroll.disable"}
-;
-;   -unroll-count=6, so there are 3 remaining conditional latches, the lowest
-;   number where the implementation cannot compute uniform weights using a
-;   simple formula.  Thus, this is our first case where -unroll-uniform-weights
-;   matters.
-;
-;     RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} MULT6,MUNIF6
-;     RUN: %{ur-bf} -unroll-count=6 | %{fc} MULT6,MFAST6
-;
-;     For either MUNIF or MFAST, multiply by 2 and sum to get the original loop
-;     body frequency, 10.
-;     MUNIF6: - do.body: float = 2.0492,
-;     MUNIF6: - do.body.2: float = 1.6393,
-;     MUNIF6: - do.body.4: float = 1.3115,
-;     MFAST6: - do.body: float = 2.1956,
-;     MFAST6: - do.body.2: float = 1.476,
-;     MFAST6: - do.body.4: float = 1.3284,
-;
-;     MULT6:       call void @f
-;     MULT6-NOT:   br
-;     MULT6:       call void @f
-;     MULT6:       br i1 %{{.*}}, label %do.body.2, label %do.end, !prof !0
-;     MULT6:       call void @f
-;     MULT6-NOT:   br
-;     MULT6:       call void @f
-;     MULT6:       br i1 %{{.*}}, label %do.body.4, label %do.end,
-;     MUNIF6-SAME:   !prof !0
-;     MFAST6-SAME:   !prof !1
-;     MULT6:       call void @f
-;     MULT6-NOT:   br
-;     MULT6:       call void @f
-;     MULT6:       br i1 %{{.*}}, label %do.body, label %do.end,
-;     MUNIF6-SAME:   !prof !0, !llvm.loop !1
-;     MFAST6-SAME:   !prof !1, !llvm.loop !2
-;
-;     MUNIF6 is like applying -unroll-count=3 to MULT2 without converting any
-;     additional conditional latches to unconditional, so (approximately)
-;     MULT2's branch weights make sense.
-;     MUNIF6: !0 = !{!"branch_weights", i32 1717986944, i32 429496704}
-;     MUNIF6: !1 = distinct !{!1, !2, !3}
-;     MUNIF6: !2 = !{!"llvm.loop.estimated_trip_count", i32 2}
-;     MUNIF6: !3 = !{!"llvm.loop.unroll.disable"}
-;
-;     There are 3 conditional latches remaining, so MFAST6 adjusts the first and
-;     leaves the second two with the original loop's branch weights.
-;     MFAST6: !0 = !{!"branch_weights", i32 1443686486, i32 703797162}
-;     MFAST6: !1 = !{!"branch_weights", i32 9, i32 1}
-;     MFAST6: !2 = distinct !{!2, !3, !4}
-;     MFAST6: !3 = !{!"llvm.loop.estimated_trip_count", i32 2}
-;     MFAST6: !4 = !{!"llvm.loop.unroll.disable"}
 
 ; ------------------------------------------------------------------------------
 ; Check case when the original loop's number of iterations is a run-time
@@ -219,7 +153,6 @@
 ; implementation tries to compute uniform weights by solving a linear equation
 ; but ultimately sets the latch's probability to zero.
 ;
-;   RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} LOW2
 ;   RUN: %{ur-bf} -unroll-count=2 | %{fc} LOW2
 ;
 ;   Multiply by 2, but the result is greater than the original loop body
@@ -240,7 +173,6 @@
 ; implementation tries to compute uniform weights using the quadratic formula
 ; but ultimately sets both latches' probabilities to zero.
 ;
-;   RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} LOW4
 ;   RUN: %{ur-bf} -unroll-count=4 | %{fc} LOW4
 ;
 ;   Multiply by 2 and sum, but the result is greater than the original loop body
@@ -261,41 +193,6 @@
 ;   LOW4: !1 = distinct !{!1, !2, !3}
 ;   LOW4: !2 = !{!"llvm.loop.estimated_trip_count", i32 1}
 ;   LOW4: !3 = !{!"llvm.loop.unroll.disable"}
-;
-; -unroll-count=6, so there are 3 remaining conditional latches.  The
-; implementation cannot compute uniform weights using a simple formula, and
-; ultimately it must set all those latches' probabilities to zero.  If not
-; -unroll-uniform-weights, then the implementation will face a new stumbling
-; block starting at the second latch: reaching the remaining iterations already
-; has a zero probability due to the zero probability set at the first latch, so
-; the required probability could accidentally be computed as negative infinity.
-;
-;   RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} LOW6
-;   RUN: %{ur-bf} -unroll-count=6 | %{fc} LOW6
-;
-;   Multiply by 2 and sum, but the result is greater than the original loop body
-;   frequency, 1, which is impossibly low.
-;   LOW6: - do.body: float = 1.0,
-;   LOW6: - do.body.2: float = 0.0{{(0000[0-9]*)?}},
-;   LOW6: - do.body.4: float = 0.0{{(0000[0-9]*)?}},
-;
-;   LOW6:     call void @f
-;   LOW6-NOT: br
-;   LOW6:     call void @f
-;   LOW6:     br i1 %{{.*}}, label %do.body.2, label %do.end, !prof !0
-;   LOW6:     call void @f
-;   LOW6-NOT: br
-;   LOW6:     call void @f
-;   LOW6:     br i1 %{{.*}}, label %do.body.4, label %do.end, !prof !0
-;   LOW6:     call void @f
-;   LOW6-NOT: br
-;   LOW6:     call void @f
-;   LOW6:     br i1 %{{.*}}, label %do.body, label %do.end, !prof !0, !llvm.loop !1
-;
-;   LOW6: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
-;   LOW6: !1 = distinct !{!1, !2, !3}
-;   LOW6: !2 = !{!"llvm.loop.estimated_trip_count", i32 1}
-;   LOW6: !3 = !{!"llvm.loop.unroll.disable"}
 
 ; ------------------------------------------------------------------------------
 ; Check cases when the original loop's number of iterations is a constant 10 and
@@ -306,7 +203,7 @@
 ; Because we test only partial unrolling, there is always exactly one unrolled
 ; iteration that can possibly exit, so only its latch can remain conditional.
 ; Because there is only one, its branch weights can be computed with a simple
-; formula, and -unroll-uniform-weights does not matter.
+; formula.
 ;
 ; Check the original loop body frequency.
 ;
@@ -315,7 +212,6 @@
 ;
 ; Check when the unrolled loop's backedge remains conditional.
 ;
-;   RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} CONST2
 ;   RUN: %{ur-bf} -unroll-count=2 | %{fc} CONST2
 ;
 ;   Multiply by 2 to get the original loop body frequency, 10.
@@ -334,7 +230,6 @@
 ;
 ; Check when the unrolled loop's backedge unconditionally continues.
 ;
-;   RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} CONST4
 ;   RUN: %{ur-bf} -unroll-count=4 | %{fc} CONST4
 ;
 ;   Multiply by 2 and sum to get the original loop body frequency, 10.



More information about the llvm-branch-commits mailing list