[llvm-branch-commits] [llvm] [LoopUnroll] Fix freqs for unconditional latches: N>2, uniform (PR #182405)
Joel E. Denny via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Thu Feb 26 15:54:59 PST 2026
https://github.com/jdenny-ornl updated https://github.com/llvm/llvm-project/pull/182405
>From d3054a4e08388bda0bb58a0557b871dd6c6cb9df Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" <jdenny.ornl at gmail.com>
Date: Thu, 19 Feb 2026 17:53:59 -0500
Subject: [PATCH] [LoopUnroll] Fix freqs for unconditional latches: N>2,
uniform
This patch introduces the command-line option
`-unroll-uniform-weights`. When computing probabilities for the
remaining N conditional latches in the unrolled loop after converting
some iterations' latches to unconditional, LoopUnroll now supports the
following three strategies:
- A. If N <= 2, use a simple formula to compute a single uniform
probability across those latches.
- B. Otherwise, if `-unroll-uniform-weights` is not specified, apply
the original loop's probability to all N latches and then, as
needed, adjust as few of them as possible.
- C. Otherwise, bisect the range [0,1] to find a single uniform
probability across all N latches. This patch implements this
strategy.
An issue with C is that it could impact compiler performance, so this
patch makes it opt-in. Its appeal over B is that it treats all
latches the same given that we have no evidence showing that any latch
should have a higher or lower probability than any other. A has
neither problem, but I do not know how to apply it for N > 2. More
experience or feedback from others might determine that some
strategies are not worthwhile to maintain.
---
llvm/lib/Transforms/Utils/LoopUnroll.cpp | 112 +++++++-
.../branch-weights-freq/unroll-complete.ll | 267 +++++++++++++-----
.../unroll-partial-unconditional-latch.ll | 70 +++--
3 files changed, 336 insertions(+), 113 deletions(-)
diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index 404e254c8a66f..1028b0f026ea3 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -89,6 +89,11 @@ UnrollRuntimeEpilog("unroll-runtime-epilog", cl::init(false), cl::Hidden,
cl::desc("Allow runtime unrolled loops to be unrolled "
"with epilog instead of prolog."));
+static cl::opt<bool> UnrollUniformWeights(
+ "unroll-uniform-weights", cl::init(false), cl::Hidden,
+ cl::desc("If new branch weights must be found, work harder to keep them "
+ "uniform."));
+
static cl::opt<bool>
UnrollVerifyDomtree("unroll-verify-domtree", cl::Hidden,
cl::desc("Verify domtree after unrolling"),
@@ -493,9 +498,9 @@ static bool canHaveUnrollRemainder(const Loop *L) {
//
// There are often many sets of latch probabilities that can produce the
// original total loop body frequency. If there are many remaining conditional
-// latches, this function just quickly hacks a few of their probabilities to
-// restore the original total loop body frequency. Otherwise, it determines
-// less arbitrary probabilities.
+// latches and !UnrollUniformWeights, this function just quickly hacks a few of
+// their probabilities to restore the original total loop body frequency.
+// Otherwise, it tries harder to determine less arbitrary probabilities.
static void fixProbContradiction(UnrollLoopOptions ULO,
BranchProbability OriginalLoopProb,
bool CompletelyUnroll,
@@ -580,10 +585,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
SetProb(I, Prob);
};
- // If n <= 2, we choose the simplest probability model we can think of: every
- // remaining conditional branch instruction has the same probability, Prob,
- // of continuing to the next iteration. This model has several helpful
- // properties:
+ // If UnrollUniformWeights or n <= 2, we choose the simplest probability model
+ // we can think of: every remaining conditional branch instruction has the
+ // same probability, Prob, of continuing to the next iteration. This model
+ // has several helpful properties:
+ // - There is only one search parameter, Prob.
// - We have no reason to think one latch branch's probability should be
// higher or lower than another, and so this model makes them all the same.
// In the worst cases, we thus avoid setting just some probabilities to 0 or
@@ -596,14 +602,32 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
//
// FreqOne = Sum(i=0..n)(c_i * p^i)
//
- // - If the backedge has been eliminated, FreqOne is the total frequency of
- // the original loop body in the unrolled loop.
- // - If the backedge remains, Sum(i=0..inf)(FreqOne * p^(n*i)) =
- // FreqOne / (1 - p^n) is the total frequency of the original loop body in
- // the unrolled loop, regardless of whether the backedge is conditional or
- // unconditional.
+ // - If the backedge has been eliminated:
+ // - FreqOne is the total frequency of the original loop body in the
+ // unrolled loop.
+ // - If Prob == 1, the total frequency of the original loop body is exactly
+ // the number of remaining loop iterations, as expected because every
+ // remaining loop iteration always then executes.
+ // - If the backedge remains:
+ // - Sum(i=0..inf)(FreqOne * p^(n*i)) = FreqOne / (1 - p^n) is the total
+ // frequency of the original loop body in the unrolled loop, regardless of
+ // whether the backedge is conditional or unconditional.
+ // - As Prob approaches 1, the total frequency of the original loop body
+ // approaches infinity, as expected because the loop approaches never
+ // exiting.
// - For n <= 2, we can use simple formulas to solve the above polynomial
// equations exactly for p without performing a search.
+ // - For n > 2, evaluating each point in the search space, using ComputeFreq
+ // below, requires about as few instructions as we could hope for. That is,
+ // the probability is constant across the conditional branches, so the only
+ // computation is across conditional branches and any backedge, as required
+ // for any model for Prob.
+ // - Prob == 1 produces the maximum possible total frequency for the original
+ // loop body, as described above. Prob == 0 produces the minimum, 0.
+ // Increasing or decreasing Prob monotonically increases or decreases the
+ // frequency, respectively. Thus, for every possible frequency, there
+ // exists some Prob that can produce it, and we can easily use bisection to
+ // search the problem space.
// When iterating for a solution, we stop early if we find probabilities
// that produce a Freq whose difference from FreqDesired is small
@@ -611,6 +635,21 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
// accurate (but surely far more accurate).
const double FreqPrec = 1e-6;
+ // Compute the new frequency produced by using Prob throughout CondLatches.
+ auto ComputeFreq = [&](double Prob) {
+ double ProbReaching = 1; // p^0
+ double FreqOne = IterCounts[0]; // c_0*p^0
+ for (unsigned I = 0, E = CondLatches.size(); I < E; ++I) {
+ ProbReaching *= Prob; // p^(I+1)
+ FreqOne += IterCounts[I + 1] * ProbReaching; // c_(I+1)*p^(I+1)
+ }
+ double ProbReachingBackedge = CompletelyUnroll ? 0 : ProbReaching;
+ assert(FreqOne > 0 && "Expected at least one iteration before first latch");
+ if (ProbReachingBackedge == 1)
+ return std::numeric_limits<double>::infinity();
+ return FreqOne / (1 - ProbReachingBackedge);
+ };
+
// Compute the probability that, used at CondLaches[0] where
// CondLatches.size() == 1, gets as close as possible to FreqDesired.
auto ComputeProbForLinear = [&]() {
@@ -619,6 +658,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
double B = IterCounts[0] - FreqDesired;
assert(A > 0 && "Expected iterations after last conditional latch");
double Prob = -B / A;
+ // If it computes an invalid Prob, FreqDesired is impossibly low or high.
+ // Otherwise, Prob should produce nearly FreqDesired.
+ assert((Prob < 0 || Prob > 1 ||
+ fabs(ComputeFreq(Prob) - FreqDesired) < FreqPrec) &&
+ "Expected accurate frequency when linear case is possible");
Prob = std::max(Prob, 0.);
Prob = std::min(Prob, 1.);
return Prob;
@@ -633,6 +677,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
double C = IterCounts[0] - FreqDesired;
assert(A > 0 && "Expected iterations after last conditional latch");
double Prob = (-B + sqrt(B * B - 4 * A * C)) / (2 * A);
+ // If it computes an invalid Prob, FreqDesired is impossibly low or high.
+ // Otherwise, Prob should produce nearly FreqDesired.
+ assert((Prob < 0 || Prob > 1 ||
+ fabs(ComputeFreq(Prob) - FreqDesired) < FreqPrec) &&
+ "Expected accurate frequency when quadratic case is possible");
Prob = std::max(Prob, 0.);
Prob = std::min(Prob, 1.);
return Prob;
@@ -733,12 +782,17 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
};
// Determine and set branch weights.
+ //
+ // Prob < 0 and Prob > 1 cannot be represented as branch weights. We might
+ // compute such a Prob if FreqDesired is impossible (e.g., due to inaccurate
+ // profile data) for the maximum trip count we have determined when completely
+ // unrolling. In that case, so just go with whichever is closest.
if (CondLatches.size() == 1) {
SetAllProbs(ComputeProbForLinear());
} else if (CondLatches.size() == 2) {
SetAllProbs(ComputeProbForQuadratic());
- } else {
- // The polynomial is too complex for a simple formula, so the quick and
+ } else if (!UnrollUniformWeights) {
+ // The polynomial is too complex for a simple formula, and the quick and
// dirty fix has been selected. Adjust probabilities starting from the
// first latch, which has the most influence on the total frequency, so
// starting there should minimize the number of latches that have to be
@@ -752,6 +806,34 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
if (fabs(Freq - FreqDesired) < FreqPrec)
break;
}
+ } else {
+ // The polynomial is too complex for a simple formula, and uniform branch
+ // weights have been selected, so bisect.
+ double ProbMin, ProbMax, ProbPrev;
+ auto TryProb = [&](double Prob) {
+ ProbPrev = Prob;
+ double FreqDelta = ComputeFreq(Prob) - FreqDesired;
+ if (fabs(FreqDelta) < FreqPrec)
+ return 0;
+ if (FreqDelta < 0) {
+ ProbMin = Prob;
+ return -1;
+ }
+ ProbMax = Prob;
+ return 1;
+ };
+ // If Prob == 0 is too small and Prob == 1 is too large, bisect between
+ // them. To place a hard upper limit on the search time, stop bisecting
+ // when Prob stops changing (ProbDelta) by much (ProbPrec).
+ if (TryProb(0.) < 0 && TryProb(1.) > 0) {
+ const double ProbPrec = 1e-12;
+ double Prob, ProbDelta;
+ do {
+ Prob = (ProbMin + ProbMax) / 2;
+ ProbDelta = Prob - ProbPrev;
+ } while (TryProb(Prob) != 0 && fabs(ProbDelta) > ProbPrec);
+ }
+ SetAllProbs(ProbPrev);
}
// FIXME: We have not considered non-latch loop exits:
diff --git a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
index 353e74be9fbd1..69da20802a0ae 100644
--- a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
+++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
@@ -58,12 +58,14 @@
; Check 1 max iteration:
; - Unroll count of >=1 should always produce complete unrolling.
; - That produces 0 unrolled iteration latches, so there are no branch weights
-; to compute.
+; to compute. Thus, -unroll-uniform-weights has no effect.
;
; Original loop body frequency is 2 (loop weight 1), which is impossibly high.
;
; RUN: sed -e s/@MAX@/1/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG1210
+; RUN: %{ur-bf} -unroll-count=1 -unroll-uniform-weights | %{fc} UR1210
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR1210
; RUN: %{ur-bf} -unroll-count=1 | %{fc} UR1210
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR1210
;
@@ -77,6 +79,8 @@
;
; RUN: sed -e s/@MAX@/1/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG1110
+; RUN: %{ur-bf} -unroll-count=1 -unroll-uniform-weights | %{fc} UR1110
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR1110
; RUN: %{ur-bf} -unroll-count=1 | %{fc} UR1110
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR1110
;
@@ -90,7 +94,8 @@
; Check 2 max iterations:
; - Unroll count of >=2 should always produce complete unrolling.
; - That produces <=1 unrolled iteration latch, so the implementation can
-; compute uniform weights by solving, at worst, a linear equation.
+; compute uniform weights by solving, at worst, a linear equation. Thus,
+; -unroll-uniform-weights has no effect.
;
; Original loop body frequency is 3 (loop weight 2), which is impossibly high.
;
@@ -99,6 +104,8 @@
;
; RUN: sed -e s/@MAX@/2/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG2310
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2310
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2310
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2310
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2310
;
@@ -120,6 +127,8 @@
;
; RUN: sed -e s/@MAX@/2/ -e s/@W@/2/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG2320
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2320
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2320
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2320
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2320
;
@@ -140,6 +149,8 @@
;
; RUN: sed -e s/@MAX@/2/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG2210
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2210
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2210
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2210
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2210
;
@@ -159,6 +170,8 @@
;
; RUN: sed -e s/@MAX@/2/ -e s/@W@/1/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG2220
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2220
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2220
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2220
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2220
;
@@ -179,6 +192,8 @@
;
; RUN: sed -e s/@MAX@/2/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG2110
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2110
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2110
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2110
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2110
;
@@ -198,6 +213,8 @@
;
; RUN: sed -e s/@MAX@/2/ -e s/@W@/0/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG2120
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2120
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2120
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2120
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2120
;
@@ -215,7 +232,8 @@
; Check 3 max iterations:
; - Unroll count of >=3 should always produce complete unrolling.
; - That produces <=2 unrolled iteration latches, so the implementation can
-; compute uniform weights solving, at worst, a quadratic equation.
+; compute uniform weights solving, at worst, a quadratic equation. Thus,
+; -unroll-uniform-weights has no effect.
;
; Original loop body frequency is 4 (loop weight 3), which is impossibly high.
;
@@ -224,6 +242,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3410
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3410
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3410
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3410
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3410
;
@@ -248,6 +268,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3430
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3430
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3430
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3430
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3430
;
@@ -269,6 +291,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
; RUN: %{bf-fc} ORIG343x
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR343x
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR343x
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR343x
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR343x
;
@@ -295,6 +319,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3310
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3310
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3310
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3310
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3310
;
@@ -317,6 +343,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3330
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3330
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3330
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3330
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3330
;
@@ -338,6 +366,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
; RUN: %{bf-fc} ORIG333x
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR333x
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR333x
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR333x
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR333x
;
@@ -363,6 +393,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3210
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3210
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3210
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3210
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3210
;
@@ -385,6 +417,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3230
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3230
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3230
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3230
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3230
;
@@ -406,6 +440,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
; RUN: %{bf-fc} ORIG323x
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR323x
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR323x
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR323x
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR323x
;
@@ -430,6 +466,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3110
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3110
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3110
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3110
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3110
;
@@ -452,6 +490,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3130
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3130
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3130
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3130
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3130
;
@@ -473,6 +513,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
; RUN: %{bf-fc} ORIG313x
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR313x
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR313x
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR313x
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR313x
;
@@ -496,6 +538,7 @@
; - Unroll count of >=4 should always produce complete unrolling.
; - That produces <=3 unrolled iteration latches. 3 is the lowest number where
; the implementation cannot compute uniform weights using a simple formula.
+; Thus, this is our first case where -unroll-uniform-weights matters.
;
; Original loop body frequency is 5 (loop weight 4), which is impossibly high.
;
@@ -504,6 +547,8 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG4510
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4510
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4510
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4510
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4510
;
@@ -531,6 +576,8 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG4540
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4540
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4540
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4540
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4540
;
@@ -554,6 +601,8 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
; RUN: %{bf-fc} ORIG454x
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR454x
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR454x
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR454x
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR454x
;
@@ -582,6 +631,8 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG4410
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4410
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4410
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4410
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4410
;
@@ -607,6 +658,8 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG4440
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4440
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4440
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4440
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4440
;
@@ -630,6 +683,8 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/3/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
; RUN: %{bf-fc} ORIG444x
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR444x
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR444x
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR444x
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR444x
;
@@ -650,39 +705,52 @@
; UR444x: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
;
; Original loop body frequency is 3 (loop weight 2). This is our first case
-; where the new probabilities vary.
+; where the new probabilities vary (unless -unroll-uniform-weights).
;
; First use a variable iteration count so that all non-final unrolled
; iterations' latches remain conditional.
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG4310
-; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4310
-; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4310
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4310,UNIF4310
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4310,UNIF4310
+; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4310,FAST4310
+; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4310,FAST4310
;
; The sum of the new do.body* is always approximately the old do.body.
; ORIG4310: - do.body: float = 3.0,
-; UR4310: - do.body: float = 1.0,
-; UR4310: - do.body.1: float = 0.94737,
-; UR4310: - do.body.2: float = 0.63158,
-; UR4310: - do.body.3: float = 0.42105,
-;
-; UR4310: call void @f
-; UR4310: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-; UR4310: call void @f
-; UR4310: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !1
-; UR4310: call void @f
-; UR4310: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !1
-; UR4310: call void @f
-; UR4310: br label %do.end
-; UR4310: !0 = !{!"branch_weights", i32 113025456, i32 2034458192}
-; UR4310: !1 = !{!"branch_weights", i32 1, i32 2}
+; UNIF4310: - do.body: float = 1.0,
+; UNIF4310: - do.body.1: float = 0.81054,
+; UNIF4310: - do.body.2: float = 0.65697,
+; UNIF4310: - do.body.3: float = 0.5325,
+; FAST4310: - do.body: float = 1.0,
+; FAST4310: - do.body.1: float = 0.94737,
+; FAST4310: - do.body.2: float = 0.63158,
+; FAST4310: - do.body.3: float = 0.42105,
+;
+; UR4310: call void @f
+; UR4310: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+; UR4310: call void @f
+; UR4310: br i1 %{{.*}}, label %do.end, label %do.body.2,
+; UNIF4310-SAME: !prof !0
+; FAST4310-SAME: !prof !1
+; UR4310: call void @f
+; UR4310: br i1 %{{.*}}, label %do.end, label %do.body.3,
+; UNIF4310-SAME: !prof !0
+; FAST4310-SAME: !prof !1
+; UR4310: call void @f
+; UR4310: br label %do.end
+; UNIF4310: !0 = !{!"branch_weights", i32 406871040, i32 1740612608}
+; FAST4310: !0 = !{!"branch_weights", i32 113025456, i32 2034458192}
+; FAST4310: !1 = !{!"branch_weights", i32 1, i32 2}
;
; Now use a constant iteration count so that all non-final unrolled
; iterations' latches unconditionally continue.
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG4340
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4340
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4340
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4340
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4340
;
@@ -706,6 +774,8 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/2/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
; RUN: %{bf-fc} ORIG434x
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR434x
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR434x
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR434x
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR434x
;
@@ -732,32 +802,45 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG4210
-; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4210
-; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4210
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4210,UNIF4210
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4210,UNIF4210
+; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4210,FAST4210
+; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4210,FAST4210
;
; The sum of the new do.body* is always the old do.body.
; ORIG4210: - do.body: float = 2.0,
-; UR4210: - do.body: float = 1.0,
-; UR4210: - do.body.1: float = 0.57143,
-; UR4210: - do.body.2: float = 0.28571,
-; UR4210: - do.body.3: float = 0.14286,
-;
-; UR4210: call void @f
-; UR4210: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-; UR4210: call void @f
-; UR4210: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !1
-; UR4210: call void @f
-; UR4210: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !1
-; UR4210: call void @f
-; UR4210: br label %do.end
-; UR4210: !0 = !{!"branch_weights", i32 920350135, i32 1227133513}
-; UR4210: !1 = !{!"branch_weights", i32 1, i32 1}
+; UNIF4210: - do.body: float = 1.0,
+; UNIF4210: - do.body.1: float = 0.54369,
+; UNIF4210: - do.body.2: float = 0.2956,
+; UNIF4210: - do.body.3: float = 0.16071,
+; FAST4210: - do.body: float = 1.0,
+; FAST4210: - do.body.1: float = 0.57143,
+; FAST4210: - do.body.2: float = 0.28571,
+; FAST4210: - do.body.3: float = 0.14286,
+;
+; UR4210: call void @f
+; UR4210: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+; UR4210: call void @f
+; UR4210: br i1 %{{.*}}, label %do.end, label %do.body.2,
+; UNIF4210-SAME: !prof !0
+; FAST4210-SAME: !prof !1
+; UR4210: call void @f
+; UR4210: br i1 %{{.*}}, label %do.end, label %do.body.3,
+; UNIF4210-SAME: !prof !0
+; FAST4210-SAME: !prof !1
+; UR4210: call void @f
+; UR4210: br label %do.end
+; UNIF4210: !0 = !{!"branch_weights", i32 979920896, i32 1167562752}
+; FAST4210: !0 = !{!"branch_weights", i32 920350135, i32 1227133513}
+; FAST4210: !1 = !{!"branch_weights", i32 1, i32 1}
;
; Now use a constant iteration count so that all non-final unrolled
; iterations' latches unconditionally continue.
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG4240
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4240
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4240
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4240
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4240
;
@@ -781,6 +864,8 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/1/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
; RUN: %{bf-fc} ORIG424x
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR424x
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR424x
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR424x
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR424x
;
@@ -807,6 +892,8 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG4110
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4110
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4110
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4110
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4110
;
@@ -832,6 +919,8 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG4140
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4140
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4140
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4140
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4140
;
@@ -855,6 +944,8 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/0/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
; RUN: %{bf-fc} ORIG414x
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR414x
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR414x
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR414x
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR414x
;
@@ -880,14 +971,16 @@
; - Unroll count of >=5 should always produce complete unrolling.
; - That produces <=4 unrolled iteration latches. When at least 3 remain
; conditional, the implementation cannot compute uniform weights using a
-; simple formula.
+; simple formula, so -unroll-uniform-weights matters.
;
; Original loop body frequency is 5 (loop weight 4).
;
; RUN: sed -e s/@MAX@/5/ -e s/@W@/4/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG5510
-; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5510
-; RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5510
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR5510,UNIF5510
+; RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} UR5510,UNIF5510
+; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5510,FAST5510
+; RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5510,FAST5510
;
; The sum of the new do.body* is the old do.body.
; ORIG5510: - do.body: float = 5.0,
@@ -899,55 +992,75 @@
;
; All continue probabilities are approximately 1, but somehow there is less
; precision in the calculation of the last case.
-; UR5510: call void @f
-; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-; UR5510: call void @f
-; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
-; UR5510: call void @f
-; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0
-; UR5510: call void @f
-; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.4, !prof !1
-; UR5510: call void @f
-; UR5510: br label %do.end
-; UR5510: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
-; UR5510: !1 = !{!"branch_weights", i32 10, i32 2147483638}
+; UR5510: call void @f
+; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+; UR5510: call void @f
+; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !0
+; UR5510: call void @f
+; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !0
+; UR5510: call void @f
+; UR5510: br i1 %{{.*}}, label %do.end, label %do.body.4,
+; UNIF5510-SAME: !prof !0
+; FAST5510-SAME: !prof !1
+; UR5510: call void @f
+; UR5510: br label %do.end
+; UNIF5510: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+; FAST5510: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+; FAST5510: !1 = !{!"branch_weights", i32 10, i32 2147483638}
;
; Original loop body frequency is 4 (loop weight 3).
;
; RUN: sed -e s/@MAX@/5/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG5410
-; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5410
-; RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5410
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR5410,UNIF5410
+; RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} UR5410,UNIF5410
+; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5410,FAST5410
+; RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5410,FAST5410
;
; The sum of the new do.body* is always the old do.body.
; ORIG5410: - do.body: float = 4.0,
-; UR5410: - do.body: float = 1.0,
-; UR5410: - do.body.1: float = 1.0,
-; UR5410: - do.body.2: float = 0.86486,
-; UR5410: - do.body.3: float = 0.64865,
-; UR5410: - do.body.4: float = 0.48649,
-;
-; This is our first case where the implementation must adjust multiple
-; probabilities to something other than the original latch probability but
-; does not just set all probabilities to the limit of 1 or 0.
-; UR5410: call void @f
-; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
-; UR5410: call void @f
-; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.2, !prof !1
-; UR5410: call void @f
-; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.3, !prof !2
-; UR5410: call void @f
-; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.4, !prof !2
-; UR5410: call void @f
-; UR5410: br label %do.end
-; UR5410: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
-; UR5410: !1 = !{!"branch_weights", i32 290200493, i32 1857283155}
-; UR5410: !2 = !{!"branch_weights", i32 1, i32 3}
+; UNIF5410: - do.body: float = 1.0,
+; UNIF5410: - do.body.1: float = 0.88818,
+; UNIF5410: - do.body.2: float = 0.78886,
+; UNIF5410: - do.body.3: float = 0.70065,
+; UNIF5410: - do.body.4: float = 0.62231,
+; FAST5410: - do.body: float = 1.0,
+; FAST5410: - do.body.1: float = 1.0,
+; FAST5410: - do.body.2: float = 0.86486,
+; FAST5410: - do.body.3: float = 0.64865,
+; FAST5410: - do.body.4: float = 0.48649,
+;
+; This is our first case where, when not using -unroll-uniform-weights, the
+; implementation must adjust multiple probabilities to something other than
+; the original latch probability but does not just set all probabilities to
+; the limit of 1 or 0.
+; UR5410: call void @f
+; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.1, !prof !0
+; UR5410: call void @f
+; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.2,
+; UNIF5410-SAME: !prof !0
+; FAST5410-SAME: !prof !1
+; UR5410: call void @f
+; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.3,
+; UNIF5410-SAME: !prof !0
+; FAST5410-SAME: !prof !2
+; UR5410: call void @f
+; UR5410: br i1 %{{.*}}, label %do.end, label %do.body.4,
+; UNIF5410-SAME: !prof !0
+; FAST5410-SAME: !prof !2
+; UR5410: call void @f
+; UR5410: br label %do.end
+; UNIF5410: !0 = !{!"branch_weights", i32 240132096, i32 1907351552}
+; FAST5410: !0 = !{!"branch_weights", i32 0, i32 -2147483648}
+; FAST5410: !1 = !{!"branch_weights", i32 290200493, i32 1857283155}
+; FAST5410: !2 = !{!"branch_weights", i32 1, i32 3}
;
; Original loop body frequency is 1 (loop weight 0).
;
; RUN: sed -e s/@MAX@/5/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG5110
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR5110
+; RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} UR5110
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR5110
; RUN: %{ur-bf} -unroll-count=6 | %{fc} UR5110
;
diff --git a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll
index a87c9390ee780..84f92e519a1c3 100644
--- a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll
+++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll
@@ -67,6 +67,7 @@
; their original probabilities are not contradicted. That is, the original loop
; latch's branch weights remain on all unrolled iterations' latches.
;
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} MULT3
; RUN: %{ur-bf} -unroll-count=3 | %{fc} MULT3
;
; Sums to approximately the original loop body frequency, 10.
@@ -91,7 +92,9 @@
;
; -unroll-count=2, so there is 1 remaining conditional latch, so the
; implementation can compute uniform weights by solving a linear equation.
+; Thus, -unroll-uniform-weights has no effect.
;
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} MULT2
; RUN: %{ur-bf} -unroll-count=2 | %{fc} MULT2
;
; Multiply by 2 to get the original loop body frequency, 10.
@@ -111,7 +114,9 @@
;
; -unroll-count=4, so there are 2 remaining conditional latches, so the
; implementation can compute uniform weights using the quadratic formula.
+; Thus, -unroll-uniform-weights has no effect.
;
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} MULT4
; RUN: %{ur-bf} -unroll-count=4 | %{fc} MULT4
;
; Multiply by 2 and sum to get the original loop body frequency, 10.
@@ -136,14 +141,20 @@
;
; -unroll-count=6, so there are 3 remaining conditional latches, the lowest
; number where the implementation cannot compute uniform weights using a
-; simple formula.
+; simple formula. Thus, this is our first case where -unroll-uniform-weights
+; matters.
;
-; RUN: %{ur-bf} -unroll-count=6 | %{fc} MULT6
+; RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} MULT6,MUNIF6
+; RUN: %{ur-bf} -unroll-count=6 | %{fc} MULT6,MFAST6
;
-; Multiply by 2 and sum to get the original loop body frequency, 10.
-; MULT6: - do.body: float = 2.1956,
-; MULT6: - do.body.2: float = 1.476,
-; MULT6: - do.body.4: float = 1.3284,
+; For either MUNIF or MFAST, multiply by 2 and sum to get the original loop
+; body frequency, 10.
+; MUNIF6: - do.body: float = 2.0492,
+; MUNIF6: - do.body.2: float = 1.6393,
+; MUNIF6: - do.body.4: float = 1.3115,
+; MFAST6: - do.body: float = 2.1956,
+; MFAST6: - do.body.2: float = 1.476,
+; MFAST6: - do.body.4: float = 1.3284,
;
; MULT6: call void @f
; MULT6-NOT: br
@@ -152,19 +163,31 @@
; MULT6: call void @f
; MULT6-NOT: br
; MULT6: call void @f
-; MULT6: br i1 %{{.*}}, label %do.body.4, label %do.end, !prof !1
+; MULT6: br i1 %{{.*}}, label %do.body.4, label %do.end,
+; MUNIF6-SAME: !prof !0
+; MFAST6-SAME: !prof !1
; MULT6: call void @f
; MULT6-NOT: br
; MULT6: call void @f
-; MULT6: br i1 %{{.*}}, label %do.body, label %do.end, !prof !1, !llvm.loop !2
-;
-; There are 3 conditional latches remaining, so it adjusts the first and
+; MULT6: br i1 %{{.*}}, label %do.body, label %do.end,
+; MUNIF6-SAME: !prof !0, !llvm.loop !1
+; MFAST6-SAME: !prof !1, !llvm.loop !2
+;
+; MUNIF6 is like applying -unroll-count=3 to MULT2 without converting any
+; additional conditional latches to unconditional, so (approximately)
+; MULT2's branch weights make sense.
+; MUNIF6: !0 = !{!"branch_weights", i32 1717986944, i32 429496704}
+; MUNIF6: !1 = distinct !{!1, !2, !3}
+; MUNIF6: !2 = !{!"llvm.loop.estimated_trip_count", i32 2}
+; MUNIF6: !3 = !{!"llvm.loop.unroll.disable"}
+;
+; There are 3 conditional latches remaining, so MFAST6 adjusts the first and
; leaves the second two with the original loop's branch weights.
-; MULT6: !0 = !{!"branch_weights", i32 1443686486, i32 703797162}
-; MULT6: !1 = !{!"branch_weights", i32 9, i32 1}
-; MULT6: !2 = distinct !{!2, !3, !4}
-; MULT6: !3 = !{!"llvm.loop.estimated_trip_count", i32 2}
-; MULT6: !4 = !{!"llvm.loop.unroll.disable"}
+; MFAST6: !0 = !{!"branch_weights", i32 1443686486, i32 703797162}
+; MFAST6: !1 = !{!"branch_weights", i32 9, i32 1}
+; MFAST6: !2 = distinct !{!2, !3, !4}
+; MFAST6: !3 = !{!"llvm.loop.estimated_trip_count", i32 2}
+; MFAST6: !4 = !{!"llvm.loop.unroll.disable"}
; ------------------------------------------------------------------------------
; Check case when the original loop's number of iterations is a run-time
@@ -185,6 +208,7 @@
; implementation tries to compute uniform weights by solving a linear equation
; but ultimately sets the latch's probability to zero.
;
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} LOW2
; RUN: %{ur-bf} -unroll-count=2 | %{fc} LOW2
;
; Multiply by 2, but the result is greater than the original loop body
@@ -205,6 +229,7 @@
; implementation tries to compute uniform weights using the quadratic formula
; but ultimately sets both latches' probabilities to zero.
;
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} LOW4
; RUN: %{ur-bf} -unroll-count=4 | %{fc} LOW4
;
; Multiply by 2 and sum, but the result is greater than the original loop body
@@ -228,12 +253,13 @@
;
; -unroll-count=6, so there are 3 remaining conditional latches. The
; implementation cannot compute uniform weights using a simple formula, and
-; ultimately it must set all those latches' probabilities to zero. The
-; implementation will face a new stumbling block starting at the second latch:
-; reaching the remaining iterations already has a zero probability due to the
-; zero probability set at the first latch, so the required probability could
-; accidentally be computed as negative infinity.
+; ultimately it must set all those latches' probabilities to zero. If not
+; -unroll-uniform-weights, then the implementation will face a new stumbling
+; block starting at the second latch: reaching the remaining iterations already
+; has a zero probability due to the zero probability set at the first latch, so
+; the required probability could accidentally be computed as negative infinity.
;
+; RUN: %{ur-bf} -unroll-count=6 -unroll-uniform-weights | %{fc} LOW6
; RUN: %{ur-bf} -unroll-count=6 | %{fc} LOW6
;
; Multiply by 2 and sum, but the result is greater than the original loop body
@@ -269,7 +295,7 @@
; Because we test only partial unrolling, there is always exactly one unrolled
; iteration that can possibly exit, so only its latch can remain conditional.
; Because there is only one, its branch weights can be computed with a simple
-; formula.
+; formula, and -unroll-uniform-weights does not matter.
;
; Check the original loop body frequency.
;
@@ -278,6 +304,7 @@
;
; Check when the unrolled loop's backedge remains conditional.
;
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} CONST2
; RUN: %{ur-bf} -unroll-count=2 | %{fc} CONST2
;
; Multiply by 2 to get the original loop body frequency, 10.
@@ -296,6 +323,7 @@
;
; Check when the unrolled loop's backedge unconditionally continues.
;
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} CONST4
; RUN: %{ur-bf} -unroll-count=4 | %{fc} CONST4
;
; Multiply by 2 and sum to get the original loop body frequency, 10.
More information about the llvm-branch-commits
mailing list