[llvm-branch-commits] [llvm] [LoopUnroll] Fix freqs for unconditional latches: N>2, uniform (PR #182405)
via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Thu Feb 19 15:56:06 PST 2026
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-llvm-transforms
Author: Joel E. Denny (jdenny-ornl)
<details>
<summary>Changes</summary>
This patch introduces the command-line option `-unroll-uniform-weights`. When computing probabilities for the remaining N conditional latches in the unrolled loop after converting some iterations' latches to unconditional, LoopUnroll now supports the following three strategies:
- A. If N <= 2, use a simple formula to compute a single uniform probability across those latches.
- B. Otherwise, if `-unroll-uniform-weights` is not specified, apply the original loop's probability to all N latches and then, as needed, adjust as few of them as possible.
- C. Otherwise, bisect the range [0,1] to find a single uniform probability across all N latches. This patch implements this strategy.
An issue with C is that it could impact compiler performance, so this patch makes it opt-in. Its appeal over B is that it treats all latches the same given that we have no evidence showing that any latch should have a higher or lower probability than any other. A has neither problem, but I do not know how to apply it for N > 2. More experience or feedback from others might determine that some strategies are not worthwhile to maintain.
---
Patch is 42.78 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/182405.diff
3 Files Affected:
- (modified) llvm/lib/Transforms/Utils/LoopUnroll.cpp (+97-15)
- (modified) llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll (+190-77)
- (modified) llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-partial-unconditional-latch.ll (+49-21)
``````````diff
diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index 404e254c8a66f..1028b0f026ea3 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -89,6 +89,11 @@ UnrollRuntimeEpilog("unroll-runtime-epilog", cl::init(false), cl::Hidden,
cl::desc("Allow runtime unrolled loops to be unrolled "
"with epilog instead of prolog."));
+static cl::opt<bool> UnrollUniformWeights(
+ "unroll-uniform-weights", cl::init(false), cl::Hidden,
+ cl::desc("If new branch weights must be found, work harder to keep them "
+ "uniform."));
+
static cl::opt<bool>
UnrollVerifyDomtree("unroll-verify-domtree", cl::Hidden,
cl::desc("Verify domtree after unrolling"),
@@ -493,9 +498,9 @@ static bool canHaveUnrollRemainder(const Loop *L) {
//
// There are often many sets of latch probabilities that can produce the
// original total loop body frequency. If there are many remaining conditional
-// latches, this function just quickly hacks a few of their probabilities to
-// restore the original total loop body frequency. Otherwise, it determines
-// less arbitrary probabilities.
+// latches and !UnrollUniformWeights, this function just quickly hacks a few of
+// their probabilities to restore the original total loop body frequency.
+// Otherwise, it tries harder to determine less arbitrary probabilities.
static void fixProbContradiction(UnrollLoopOptions ULO,
BranchProbability OriginalLoopProb,
bool CompletelyUnroll,
@@ -580,10 +585,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
SetProb(I, Prob);
};
- // If n <= 2, we choose the simplest probability model we can think of: every
- // remaining conditional branch instruction has the same probability, Prob,
- // of continuing to the next iteration. This model has several helpful
- // properties:
+ // If UnrollUniformWeights or n <= 2, we choose the simplest probability model
+ // we can think of: every remaining conditional branch instruction has the
+ // same probability, Prob, of continuing to the next iteration. This model
+ // has several helpful properties:
+ // - There is only one search parameter, Prob.
// - We have no reason to think one latch branch's probability should be
// higher or lower than another, and so this model makes them all the same.
// In the worst cases, we thus avoid setting just some probabilities to 0 or
@@ -596,14 +602,32 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
//
// FreqOne = Sum(i=0..n)(c_i * p^i)
//
- // - If the backedge has been eliminated, FreqOne is the total frequency of
- // the original loop body in the unrolled loop.
- // - If the backedge remains, Sum(i=0..inf)(FreqOne * p^(n*i)) =
- // FreqOne / (1 - p^n) is the total frequency of the original loop body in
- // the unrolled loop, regardless of whether the backedge is conditional or
- // unconditional.
+ // - If the backedge has been eliminated:
+ // - FreqOne is the total frequency of the original loop body in the
+ // unrolled loop.
+ // - If Prob == 1, the total frequency of the original loop body is exactly
+ // the number of remaining loop iterations, as expected because every
+ // remaining loop iteration always then executes.
+ // - If the backedge remains:
+ // - Sum(i=0..inf)(FreqOne * p^(n*i)) = FreqOne / (1 - p^n) is the total
+ // frequency of the original loop body in the unrolled loop, regardless of
+ // whether the backedge is conditional or unconditional.
+ // - As Prob approaches 1, the total frequency of the original loop body
+ // approaches infinity, as expected because the loop approaches never
+ // exiting.
// - For n <= 2, we can use simple formulas to solve the above polynomial
// equations exactly for p without performing a search.
+ // - For n > 2, evaluating each point in the search space, using ComputeFreq
+ // below, requires about as few instructions as we could hope for. That is,
+ // the probability is constant across the conditional branches, so the only
+ // computation is across conditional branches and any backedge, as required
+ // for any model for Prob.
+ // - Prob == 1 produces the maximum possible total frequency for the original
+ // loop body, as described above. Prob == 0 produces the minimum, 0.
+ // Increasing or decreasing Prob monotonically increases or decreases the
+ // frequency, respectively. Thus, for every possible frequency, there
+ // exists some Prob that can produce it, and we can easily use bisection to
+ // search the problem space.
// When iterating for a solution, we stop early if we find probabilities
// that produce a Freq whose difference from FreqDesired is small
@@ -611,6 +635,21 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
// accurate (but surely far more accurate).
const double FreqPrec = 1e-6;
+ // Compute the new frequency produced by using Prob throughout CondLatches.
+ auto ComputeFreq = [&](double Prob) {
+ double ProbReaching = 1; // p^0
+ double FreqOne = IterCounts[0]; // c_0*p^0
+ for (unsigned I = 0, E = CondLatches.size(); I < E; ++I) {
+ ProbReaching *= Prob; // p^(I+1)
+ FreqOne += IterCounts[I + 1] * ProbReaching; // c_(I+1)*p^(I+1)
+ }
+ double ProbReachingBackedge = CompletelyUnroll ? 0 : ProbReaching;
+ assert(FreqOne > 0 && "Expected at least one iteration before first latch");
+ if (ProbReachingBackedge == 1)
+ return std::numeric_limits<double>::infinity();
+ return FreqOne / (1 - ProbReachingBackedge);
+ };
+
// Compute the probability that, used at CondLaches[0] where
// CondLatches.size() == 1, gets as close as possible to FreqDesired.
auto ComputeProbForLinear = [&]() {
@@ -619,6 +658,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
double B = IterCounts[0] - FreqDesired;
assert(A > 0 && "Expected iterations after last conditional latch");
double Prob = -B / A;
+ // If it computes an invalid Prob, FreqDesired is impossibly low or high.
+ // Otherwise, Prob should produce nearly FreqDesired.
+ assert((Prob < 0 || Prob > 1 ||
+ fabs(ComputeFreq(Prob) - FreqDesired) < FreqPrec) &&
+ "Expected accurate frequency when linear case is possible");
Prob = std::max(Prob, 0.);
Prob = std::min(Prob, 1.);
return Prob;
@@ -633,6 +677,11 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
double C = IterCounts[0] - FreqDesired;
assert(A > 0 && "Expected iterations after last conditional latch");
double Prob = (-B + sqrt(B * B - 4 * A * C)) / (2 * A);
+ // If it computes an invalid Prob, FreqDesired is impossibly low or high.
+ // Otherwise, Prob should produce nearly FreqDesired.
+ assert((Prob < 0 || Prob > 1 ||
+ fabs(ComputeFreq(Prob) - FreqDesired) < FreqPrec) &&
+ "Expected accurate frequency when quadratic case is possible");
Prob = std::max(Prob, 0.);
Prob = std::min(Prob, 1.);
return Prob;
@@ -733,12 +782,17 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
};
// Determine and set branch weights.
+ //
+ // Prob < 0 and Prob > 1 cannot be represented as branch weights. We might
+ // compute such a Prob if FreqDesired is impossible (e.g., due to inaccurate
+ // profile data) for the maximum trip count we have determined when completely
+ // unrolling. In that case, so just go with whichever is closest.
if (CondLatches.size() == 1) {
SetAllProbs(ComputeProbForLinear());
} else if (CondLatches.size() == 2) {
SetAllProbs(ComputeProbForQuadratic());
- } else {
- // The polynomial is too complex for a simple formula, so the quick and
+ } else if (!UnrollUniformWeights) {
+ // The polynomial is too complex for a simple formula, and the quick and
// dirty fix has been selected. Adjust probabilities starting from the
// first latch, which has the most influence on the total frequency, so
// starting there should minimize the number of latches that have to be
@@ -752,6 +806,34 @@ static void fixProbContradiction(UnrollLoopOptions ULO,
if (fabs(Freq - FreqDesired) < FreqPrec)
break;
}
+ } else {
+ // The polynomial is too complex for a simple formula, and uniform branch
+ // weights have been selected, so bisect.
+ double ProbMin, ProbMax, ProbPrev;
+ auto TryProb = [&](double Prob) {
+ ProbPrev = Prob;
+ double FreqDelta = ComputeFreq(Prob) - FreqDesired;
+ if (fabs(FreqDelta) < FreqPrec)
+ return 0;
+ if (FreqDelta < 0) {
+ ProbMin = Prob;
+ return -1;
+ }
+ ProbMax = Prob;
+ return 1;
+ };
+ // If Prob == 0 is too small and Prob == 1 is too large, bisect between
+ // them. To place a hard upper limit on the search time, stop bisecting
+ // when Prob stops changing (ProbDelta) by much (ProbPrec).
+ if (TryProb(0.) < 0 && TryProb(1.) > 0) {
+ const double ProbPrec = 1e-12;
+ double Prob, ProbDelta;
+ do {
+ Prob = (ProbMin + ProbMax) / 2;
+ ProbDelta = Prob - ProbPrev;
+ } while (TryProb(Prob) != 0 && fabs(ProbDelta) > ProbPrec);
+ }
+ SetAllProbs(ProbPrev);
}
// FIXME: We have not considered non-latch loop exits:
diff --git a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
index 353e74be9fbd1..69da20802a0ae 100644
--- a/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
+++ b/llvm/test/Transforms/LoopUnroll/branch-weights-freq/unroll-complete.ll
@@ -58,12 +58,14 @@
; Check 1 max iteration:
; - Unroll count of >=1 should always produce complete unrolling.
; - That produces 0 unrolled iteration latches, so there are no branch weights
-; to compute.
+; to compute. Thus, -unroll-uniform-weights has no effect.
;
; Original loop body frequency is 2 (loop weight 1), which is impossibly high.
;
; RUN: sed -e s/@MAX@/1/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG1210
+; RUN: %{ur-bf} -unroll-count=1 -unroll-uniform-weights | %{fc} UR1210
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR1210
; RUN: %{ur-bf} -unroll-count=1 | %{fc} UR1210
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR1210
;
@@ -77,6 +79,8 @@
;
; RUN: sed -e s/@MAX@/1/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG1110
+; RUN: %{ur-bf} -unroll-count=1 -unroll-uniform-weights | %{fc} UR1110
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR1110
; RUN: %{ur-bf} -unroll-count=1 | %{fc} UR1110
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR1110
;
@@ -90,7 +94,8 @@
; Check 2 max iterations:
; - Unroll count of >=2 should always produce complete unrolling.
; - That produces <=1 unrolled iteration latch, so the implementation can
-; compute uniform weights by solving, at worst, a linear equation.
+; compute uniform weights by solving, at worst, a linear equation. Thus,
+; -unroll-uniform-weights has no effect.
;
; Original loop body frequency is 3 (loop weight 2), which is impossibly high.
;
@@ -99,6 +104,8 @@
;
; RUN: sed -e s/@MAX@/2/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG2310
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2310
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2310
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2310
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2310
;
@@ -120,6 +127,8 @@
;
; RUN: sed -e s/@MAX@/2/ -e s/@W@/2/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG2320
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2320
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2320
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2320
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2320
;
@@ -140,6 +149,8 @@
;
; RUN: sed -e s/@MAX@/2/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG2210
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2210
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2210
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2210
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2210
;
@@ -159,6 +170,8 @@
;
; RUN: sed -e s/@MAX@/2/ -e s/@W@/1/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG2220
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2220
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2220
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2220
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2220
;
@@ -179,6 +192,8 @@
;
; RUN: sed -e s/@MAX@/2/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG2110
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2110
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2110
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2110
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2110
;
@@ -198,6 +213,8 @@
;
; RUN: sed -e s/@MAX@/2/ -e s/@W@/0/ -e s/@MIN@/2/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG2120
+; RUN: %{ur-bf} -unroll-count=2 -unroll-uniform-weights | %{fc} UR2120
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR2120
; RUN: %{ur-bf} -unroll-count=2 | %{fc} UR2120
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR2120
;
@@ -215,7 +232,8 @@
; Check 3 max iterations:
; - Unroll count of >=3 should always produce complete unrolling.
; - That produces <=2 unrolled iteration latches, so the implementation can
-; compute uniform weights solving, at worst, a quadratic equation.
+; compute uniform weights solving, at worst, a quadratic equation. Thus,
+; -unroll-uniform-weights has no effect.
;
; Original loop body frequency is 4 (loop weight 3), which is impossibly high.
;
@@ -224,6 +242,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3410
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3410
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3410
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3410
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3410
;
@@ -248,6 +268,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3430
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3430
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3430
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3430
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3430
;
@@ -269,6 +291,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/3/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
; RUN: %{bf-fc} ORIG343x
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR343x
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR343x
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR343x
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR343x
;
@@ -295,6 +319,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3310
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3310
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3310
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3310
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3310
;
@@ -317,6 +343,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3330
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3330
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3330
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3330
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3330
;
@@ -338,6 +366,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/2/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
; RUN: %{bf-fc} ORIG333x
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR333x
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR333x
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR333x
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR333x
;
@@ -363,6 +393,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3210
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3210
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3210
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3210
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3210
;
@@ -385,6 +417,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3230
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3230
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3230
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3230
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3230
;
@@ -406,6 +440,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/1/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
; RUN: %{bf-fc} ORIG323x
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR323x
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR323x
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR323x
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR323x
;
@@ -430,6 +466,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3110
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3110
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3110
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3110
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3110
;
@@ -452,6 +490,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/3/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG3130
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR3130
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR3130
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR3130
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR3130
;
@@ -473,6 +513,8 @@
;
; RUN: sed -e s/@MAX@/3/ -e s/@W@/0/ -e s/@MIN@/3/ -e s/@I_0@/%x/ %s > %t.ll
; RUN: %{bf-fc} ORIG313x
+; RUN: %{ur-bf} -unroll-count=3 -unroll-uniform-weights | %{fc} UR313x
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR313x
; RUN: %{ur-bf} -unroll-count=3 | %{fc} UR313x
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR313x
;
@@ -496,6 +538,7 @@
; - Unroll count of >=4 should always produce complete unrolling.
; - That produces <=3 unrolled iteration latches. 3 is the lowest number where
; the implementation cannot compute uniform weights using a simple formula.
+; Thus, this is our first case where -unroll-uniform-weights matters.
;
; Original loop body frequency is 5 (loop weight 4), which is impossibly high.
;
@@ -504,6 +547,8 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/1/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG4510
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4510
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4510
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4510
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4510
;
@@ -531,6 +576,8 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/4/ -e s/@I_0@/0/ %s > %t.ll
; RUN: %{bf-fc} ORIG4540
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR4540
+; RUN: %{ur-bf} -unroll-count=5 -unroll-uniform-weights | %{fc} UR4540
; RUN: %{ur-bf} -unroll-count=4 | %{fc} UR4540
; RUN: %{ur-bf} -unroll-count=5 | %{fc} UR4540
;
@@ -554,6 +601,8 @@
;
; RUN: sed -e s/@MAX@/4/ -e s/@W@/4/ -e s/@MIN@/4/ -e s/@I_0@/%x/ %s > %t.ll
; RUN: %{bf-fc} ORIG454x
+; RUN: %{ur-bf} -unroll-count=4 -unroll-uniform-weights | %{fc} UR454x
+; RUN: %{ur-bf} -unroll-count=5 -unroll-unifo...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/182405
More information about the llvm-branch-commits
mailing list