[clang] [analyzer] Don't assume third iteration in loops (PR #119388)

DonĂ¡t Nagy via cfe-commits cfe-commits at lists.llvm.org
Thu Jan 2 06:44:28 PST 2025


https://github.com/NagyDonat updated https://github.com/llvm/llvm-project/pull/119388

>From cb9b5ef4d8b0ed8e67276947525d327c547424fc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Thu, 28 Nov 2024 17:22:58 +0100
Subject: [PATCH 1/9] [analyzer] Don't assume third iteration in loops

This commit ensures that if the loop condition is opaque (the analyzer
cannot determine whether it's true or false) and there were at least two
iterations, then the analyzer doesn't make the unjustified assumption
that it can enter yet another iteration.

Note that the presence of a loop suggests that the developer thought
that two iterations can happen (otherwise an `if` would've been
sufficient), but it does not imply that the developer expected three or
four iterations -- and in fact there are many false positives where a
loop iterates over a two-element (or three-element) data structure, but
the analyzer cannot understand the loop condition and blindly assumes
that there may be three or more iterations. (In particular, analyzing
the FFMPEG project produces 100+ such false positives.)

Moreover, this provides some performance improvements in the sense that
the analyzer won't waste time on traversing the execution paths with 3
or 4 iterations in a loop (which are very similar to the paths with 2
iterations) and therefore will be able to traverse more branches
elsewhere on the `ExplodedGraph`.

This logic is disabled if the user enables the widen-loops analyzer
option (which is disabled by default), because the "simulate one final
iteration after the invalidation" execution path would be suppressed by
the "exit the loop if the loop condition is opaque and there were at
least two iterations" logic. If we want to support loop widening, we
would need to create a follow-up commit which ensures that it "plays
nicely" with this logic.
---
 .../Core/PathSensitive/CoreEngine.h           |  8 +++
 .../Core/PathSensitive/ExprEngine.h           | 18 +++---
 clang/lib/StaticAnalyzer/Core/CoreEngine.cpp  | 27 ++++++++-
 clang/lib/StaticAnalyzer/Core/ExprEngine.cpp  | 55 ++++++++++++++++---
 clang/test/Analysis/loop-unrolling.cpp        | 35 +++++++-----
 clang/test/Analysis/misc-ps-region-store.m    | 31 ++++++++---
 6 files changed, 133 insertions(+), 41 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CoreEngine.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CoreEngine.h
index a6d05a3ac67b4e..80b79fd4e928f8 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CoreEngine.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CoreEngine.h
@@ -126,6 +126,14 @@ class CoreEngine {
   ExplodedNode *generateCallExitBeginNode(ExplodedNode *N,
                                           const ReturnStmt *RS);
 
+  /// Helper function called by `HandleBranch()`. If the currently handled
+  /// branch corresponds to a loop, this returns the number of already
+  /// completed iterations in that loop, otherwise the return value is
+  /// `std::nullopt`. Note that this counts _all_ earlier iterations, including
+  /// ones that were performed within an earlier iteration of an outer loop.
+  std::optional<unsigned> getCompletedIterationCount(const CFGBlock *B,
+                                                     ExplodedNode *Pred) const;
+
 public:
   /// Construct a CoreEngine object to analyze the provided CFG.
   CoreEngine(ExprEngine &exprengine,
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
index 8c7493e27fcaa6..20c446e33ef9a5 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
@@ -321,14 +321,14 @@ class ExprEngine {
                                NodeBuilderWithSinks &nodeBuilder,
                                ExplodedNode *Pred);
 
-  /// ProcessBranch - Called by CoreEngine.  Used to generate successor
-  ///  nodes by processing the 'effects' of a branch condition.
-  void processBranch(const Stmt *Condition,
-                     NodeBuilderContext& BuilderCtx,
-                     ExplodedNode *Pred,
-                     ExplodedNodeSet &Dst,
-                     const CFGBlock *DstT,
-                     const CFGBlock *DstF);
+  /// ProcessBranch - Called by CoreEngine. Used to generate successor nodes by
+  /// processing the 'effects' of a branch condition. If the branch condition
+  /// is a loop condition, IterationsCompletedInLoop is the number of completed
+  /// iterations (otherwise it's std::nullopt).
+  void processBranch(const Stmt *Condition, NodeBuilderContext &BuilderCtx,
+                     ExplodedNode *Pred, ExplodedNodeSet &Dst,
+                     const CFGBlock *DstT, const CFGBlock *DstF,
+                     std::optional<unsigned> IterationsCompletedInLoop);
 
   /// Called by CoreEngine.
   /// Used to generate successor nodes for temporary destructors depending
@@ -588,6 +588,8 @@ class ExprEngine {
   void evalEagerlyAssumeBifurcation(ExplodedNodeSet &Dst, ExplodedNodeSet &Src,
                                     const Expr *Ex);
 
+  bool didEagerlyAssumeBifurcateAt(ProgramStateRef State, const Expr *Ex) const;
+
   static std::pair<const ProgramPointTag *, const ProgramPointTag *>
   getEagerlyAssumeBifurcationTags();
 
diff --git a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
index 67b7d30853d9de..775a22e18c6199 100644
--- a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
@@ -444,7 +444,8 @@ void CoreEngine::HandleBranch(const Stmt *Cond, const Stmt *Term,
   NodeBuilderContext Ctx(*this, B, Pred);
   ExplodedNodeSet Dst;
   ExprEng.processBranch(Cond, Ctx, Pred, Dst, *(B->succ_begin()),
-                       *(B->succ_begin() + 1));
+                        *(B->succ_begin() + 1),
+                        getCompletedIterationCount(B, Pred));
   // Enqueue the new frontier onto the worklist.
   enqueue(Dst);
 }
@@ -591,6 +592,30 @@ ExplodedNode *CoreEngine::generateCallExitBeginNode(ExplodedNode *N,
   return isNew ? Node : nullptr;
 }
 
+std::optional<unsigned>
+CoreEngine::getCompletedIterationCount(const CFGBlock *B,
+                                       ExplodedNode *Pred) const {
+  const LocationContext *LC = Pred->getLocationContext();
+  BlockCounter Counter = WList->getBlockCounter();
+  unsigned BlockCount =
+      Counter.getNumVisited(LC->getStackFrame(), B->getBlockID());
+
+  const Stmt *Term = B->getTerminatorStmt();
+  if (isa<ForStmt, WhileStmt, CXXForRangeStmt>(Term)) {
+    assert(BlockCount >= 1 &&
+           "Block count of currently analyzed block must be >= 1");
+    return BlockCount - 1;
+  }
+  if (isa<DoStmt>(Term)) {
+    // In a do-while loop one iteration happens before the first evaluation of
+    // the loop condition, so we don't subtract one.
+    return BlockCount;
+  }
+  // ObjCForCollectionStmt is skipped intentionally because the current
+  // application of the iteration counts is not relevant for it.
+  return std::nullopt;
+}
+
 void CoreEngine::enqueue(ExplodedNodeSet &Set) {
   for (const auto I : Set)
     WList->enqueue(I);
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index b46cd9fe86fc11..db31206a13c36b 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -2753,12 +2753,10 @@ assumeCondition(const Stmt *Condition, ExplodedNode *N) {
   return State->assume(V);
 }
 
-void ExprEngine::processBranch(const Stmt *Condition,
-                               NodeBuilderContext& BldCtx,
-                               ExplodedNode *Pred,
-                               ExplodedNodeSet &Dst,
-                               const CFGBlock *DstT,
-                               const CFGBlock *DstF) {
+void ExprEngine::processBranch(
+    const Stmt *Condition, NodeBuilderContext &BldCtx, ExplodedNode *Pred,
+    ExplodedNodeSet &Dst, const CFGBlock *DstT, const CFGBlock *DstF,
+    std::optional<unsigned> IterationsCompletedInLoop) {
   assert((!Condition || !isa<CXXBindTemporaryExpr>(Condition)) &&
          "CXXBindTemporaryExprs are handled by processBindTemporary.");
   const LocationContext *LCtx = Pred->getLocationContext();
@@ -2801,8 +2799,35 @@ void ExprEngine::processBranch(const Stmt *Condition,
     if (StTrue && StFalse)
       assert(!isa<ObjCForCollectionStmt>(Condition));
 
-    if (StTrue)
-      Builder.generateNode(StTrue, true, PredN);
+    if (StTrue) {
+      // If we are processing a loop condition where two iterations have
+      // already been completed and the the false branch is also feasible, then
+      // don't assume a third iteration, because it is a redundant execution
+      // path (unlikely to be different from earlier loop exits) and can cause
+      // false positives if e.g. the loop iterates over a two-element structure
+      // with an opaque condition.
+      //
+      // The iteration count "2" is hardcoded because it's the natural limit:
+      // * the fact that the programmer wrote a loop (and not just an `if`)
+      //   implies that they thought that the loop body may be executed twice;
+      // * however, there are situations where the programmer knows that there
+      //   are at most two iterations, but writes a loop that appears to be
+      //   generic, because there is no special syntax for "loop with at most
+      //   two iterations". (This pattern is common in FFMPEG and appears in
+      //   many other projects as well.)
+      bool CompletedTwoIterations = IterationsCompletedInLoop.value_or(0) >= 2;
+      bool FalseAlsoFeasible =
+          StFalse ||
+          didEagerlyAssumeBifurcateAt(PrevState, dyn_cast<Expr>(Condition));
+      bool SkipTrueBranch = CompletedTwoIterations && FalseAlsoFeasible;
+
+      // FIXME: This "don't assume third iteration" heuristic partially
+      // conflicts with the widen-loop analysis option (which is off by
+      // default). If we intend to support and stabilize the loop widening,
+      // we'll need to ensure that it 'plays nicely' with this logic.
+      if (!SkipTrueBranch || AMgr.options.ShouldWidenLoops)
+        Builder.generateNode(StTrue, true, PredN);
+    }
 
     if (StFalse)
       Builder.generateNode(StFalse, false, PredN);
@@ -3724,6 +3749,10 @@ ExprEngine::getEagerlyAssumeBifurcationTags() {
   return std::make_pair(&TrueTag, &FalseTag);
 }
 
+/// The last expression where EagerlyAssume produced two transitions (i.e. it
+/// activated and the true and false case were both feasible).
+REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeBifurcationAt, const Expr *)
+
 void ExprEngine::evalEagerlyAssumeBifurcation(ExplodedNodeSet &Dst,
                                               ExplodedNodeSet &Src,
                                               const Expr *Ex) {
@@ -3746,6 +3775,11 @@ void ExprEngine::evalEagerlyAssumeBifurcation(ExplodedNodeSet &Dst,
 
       auto [StateTrue, StateFalse] = State->assume(*SEV);
 
+      if (StateTrue && StateFalse) {
+        StateTrue = StateTrue->set<LastEagerlyAssumeBifurcationAt>(Ex);
+        StateFalse = StateFalse->set<LastEagerlyAssumeBifurcationAt>(Ex);
+      }
+
       // First assume that the condition is true.
       if (StateTrue) {
         SVal Val = svalBuilder.makeIntVal(1U, Ex->getType());
@@ -3763,6 +3797,11 @@ void ExprEngine::evalEagerlyAssumeBifurcation(ExplodedNodeSet &Dst,
   }
 }
 
+bool ExprEngine::didEagerlyAssumeBifurcateAt(ProgramStateRef State,
+                                             const Expr *Ex) const {
+  return Ex && State->get<LastEagerlyAssumeBifurcationAt>() == Ex;
+}
+
 void ExprEngine::VisitGCCAsmStmt(const GCCAsmStmt *A, ExplodedNode *Pred,
                                  ExplodedNodeSet &Dst) {
   StmtNodeBuilder Bldr(Pred, Dst, *currBldrCtx);
diff --git a/clang/test/Analysis/loop-unrolling.cpp b/clang/test/Analysis/loop-unrolling.cpp
index 66a828abfb5133..bf05a7739ce48c 100644
--- a/clang/test/Analysis/loop-unrolling.cpp
+++ b/clang/test/Analysis/loop-unrolling.cpp
@@ -63,7 +63,7 @@ int simple_no_unroll1() {
   int a[9];
   int k = 42;
   for (int i = 0; i < 9; i++) {
-    clang_analyzer_numTimesReached(); // expected-warning {{4}}
+    clang_analyzer_numTimesReached(); // expected-warning {{2}}
     a[i] = 42;
     foo(i);
   }
@@ -76,7 +76,7 @@ int simple_no_unroll2() {
   int k = 42;
   int i;
   for (i = 0; i < 9; i++) {
-    clang_analyzer_numTimesReached(); // expected-warning {{4}}
+    clang_analyzer_numTimesReached(); // expected-warning {{2}}
     a[i] = 42;
     i += getNum();
   }
@@ -309,9 +309,9 @@ int nested_inner_unrolled() {
   int k = 42;
   int j = 0;
   for (int i = 0; i < getNum(); i++) {
-    clang_analyzer_numTimesReached(); // expected-warning {{4}}
+    clang_analyzer_numTimesReached(); // expected-warning {{2}}
     for (j = 0; j < 8; ++j) {
-      clang_analyzer_numTimesReached(); // expected-warning {{32}}
+      clang_analyzer_numTimesReached(); // expected-warning {{16}}
       a[j] = 22;
     }
     a[i] = 42;
@@ -346,11 +346,7 @@ int simple_known_bound_loop() {
 
 int simple_unknown_bound_loop() {
   for (int i = 2; i < getNum(); i++) {
-#ifdef DFS
-    clang_analyzer_numTimesReached(); // expected-warning {{16}}
-#else
     clang_analyzer_numTimesReached(); // expected-warning {{8}}
-#endif
   }
   return 0;
 }
@@ -368,11 +364,7 @@ int nested_inlined_unroll1() {
 int nested_inlined_no_unroll1() {
   int k;
   for (int i = 0; i < 9; i++) {
-#ifdef DFS
-    clang_analyzer_numTimesReached(); // expected-warning {{18}}
-#else
-    clang_analyzer_numTimesReached(); // expected-warning {{14}}
-#endif
+    clang_analyzer_numTimesReached(); // expected-warning {{10}}
     k = simple_unknown_bound_loop();  // reevaluation without inlining, splits the state as well
   }
   int a = 22 / k; // no-warning
@@ -475,9 +467,13 @@ int num_steps_over_limit2() {
 
 int num_steps_on_limit3() {
   for (int i = 0; i < getNum(); i++) {
-    clang_analyzer_numTimesReached(); // expected-warning {{4}}
+    clang_analyzer_numTimesReached(); // expected-warning {{2}}
     for (int j = 0; j < 32; j++) {
-      clang_analyzer_numTimesReached(); // expected-warning {{128}}
+      // Here the loop unrollig logic calculates with four potential iterations
+      // in the outer loop where it cannot determine the iteration count in
+      // advance; but after two loops the analyzer conservatively assumes that
+      // the (still opaque) loop condition is false.
+      clang_analyzer_numTimesReached(); // expected-warning {{64}}
     }
   }
   return 0;
@@ -493,6 +489,15 @@ int num_steps_over_limit3() {
   return 0;
 }
 
+int num_steps_on_limit4() {
+  for (int i = 0; i < 4; i++) {
+    clang_analyzer_numTimesReached(); // expected-warning {{4}}
+    for (int j = 0; j < 32; j++) {
+      clang_analyzer_numTimesReached(); // expected-warning {{128}}
+    }
+  }
+  return 0;
+}
 
 void pr34943() {
   for (int i = 0; i < 6L; ++i) {
diff --git a/clang/test/Analysis/misc-ps-region-store.m b/clang/test/Analysis/misc-ps-region-store.m
index 668b5ffd7001a6..b0a04b836efb99 100644
--- a/clang/test/Analysis/misc-ps-region-store.m
+++ b/clang/test/Analysis/misc-ps-region-store.m
@@ -910,13 +910,13 @@ void pr6302(id x, Class y) {
 
 //===----------------------------------------------------------------------===//
 // Specially handle global variables that are declared constant.  In the
-// example below, this forces the loop to take exactly 2 iterations.
+// example below, this forces the loop to take exactly 1 iterations.
 //===----------------------------------------------------------------------===//
 
-const int pr6288_L_N = 2;
+const int pr6288_L_N = 1;
 void pr6288_(void) {
-  int x[2];
-  int *px[2];
+  int x[1];
+  int *px[1];
   int i;
   for (i = 0; i < pr6288_L_N; i++)
     px[i] = &x[i];
@@ -924,8 +924,8 @@ void pr6288_(void) {
 }
 
 void pr6288_pos(int z) {
-  int x[2];
-  int *px[2];
+  int x[1];
+  int *px[1];
   int i;
   for (i = 0; i < z; i++)
     px[i] = &x[i]; // expected-warning{{Access out-of-bound array element (buffer overflow)}}
@@ -933,15 +933,28 @@ void pr6288_pos(int z) {
 }
 
 void pr6288_b(void) {
-  const int L_N = 2;
-  int x[2];
-  int *px[2];
+  const int L_N = 1;
+  int x[1];
+  int *px[1];
   int i;
   for (i = 0; i < L_N; i++)
     px[i] = &x[i];
   *(px[0]) = 0; // no-warning
 }
 
+void pr6288_no_third_iter(int z) {
+  int x[2];
+  int *px[2];
+  int i;
+  // If the loop condition is opaque, we assume that there may be two
+  // iterations (becasuse otherwise the loop could be replaced by an if); but
+  // we do not assume that there may be a third iteration. Therefore,
+  // unlike 'pr6288_pos', this testcase does not produce an out-of-bounds error.
+  for (i = 0; i < z; i++)
+    px[i] = &x[i];
+  *(px[0]) = 0; // expected-warning{{Dereference of undefined pointer value}}
+}
+
 // A bug in RemoveDeadBindings was causing instance variable bindings to get
 // prematurely pruned from the state.
 @interface Rdar7817800 {

>From db451194747293d5e4a32468cdaee6f244322ffb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Fri, 13 Dec 2024 17:03:02 +0100
Subject: [PATCH 2/9] Apply grammar suggestions

Co-authored-by: Balazs Benics <benicsbalazs at gmail.com>
---
 clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index db31206a13c36b..f8b13618c2e2dd 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -2801,17 +2801,17 @@ void ExprEngine::processBranch(
 
     if (StTrue) {
       // If we are processing a loop condition where two iterations have
-      // already been completed and the the false branch is also feasible, then
-      // don't assume a third iteration, because it is a redundant execution
+      // already been completed and the false branch is also feasible, then
+      // don't assume a third iteration because it is a redundant execution
       // path (unlikely to be different from earlier loop exits) and can cause
       // false positives if e.g. the loop iterates over a two-element structure
       // with an opaque condition.
       //
       // The iteration count "2" is hardcoded because it's the natural limit:
       // * the fact that the programmer wrote a loop (and not just an `if`)
-      //   implies that they thought that the loop body may be executed twice;
+      //   implies that they thought that the loop body might be executed twice;
       // * however, there are situations where the programmer knows that there
-      //   are at most two iterations, but writes a loop that appears to be
+      //   are at most two iterations but writes a loop that appears to be
       //   generic, because there is no special syntax for "loop with at most
       //   two iterations". (This pattern is common in FFMPEG and appears in
       //   many other projects as well.)
@@ -2824,7 +2824,7 @@ void ExprEngine::processBranch(
       // FIXME: This "don't assume third iteration" heuristic partially
       // conflicts with the widen-loop analysis option (which is off by
       // default). If we intend to support and stabilize the loop widening,
-      // we'll need to ensure that it 'plays nicely' with this logic.
+      // we must ensure that it 'plays nicely' with this logic.
       if (!SkipTrueBranch || AMgr.options.ShouldWidenLoops)
         Builder.generateNode(StTrue, true, PredN);
     }
@@ -3750,7 +3750,7 @@ ExprEngine::getEagerlyAssumeBifurcationTags() {
 }
 
 /// The last expression where EagerlyAssume produced two transitions (i.e. it
-/// activated and the true and false case were both feasible).
+/// activated and the true and false cases were both feasible).
 REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeBifurcationAt, const Expr *)
 
 void ExprEngine::evalEagerlyAssumeBifurcation(ExplodedNodeSet &Dst,

>From 8791294da64063b5ee5c85b2beb37812934f06e9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Fri, 20 Dec 2024 16:25:13 +0100
Subject: [PATCH 3/9] Add dedicated tests

---
 clang/test/Analysis/loop-assumptions.c | 205 +++++++++++++++++++++++++
 1 file changed, 205 insertions(+)
 create mode 100644 clang/test/Analysis/loop-assumptions.c

diff --git a/clang/test/Analysis/loop-assumptions.c b/clang/test/Analysis/loop-assumptions.c
new file mode 100644
index 00000000000000..064ce2bf381fc6
--- /dev/null
+++ b/clang/test/Analysis/loop-assumptions.c
@@ -0,0 +1,205 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=debug.ExprInspection \
+// RUN:     -verify=expected,eagerlyassume %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=debug.ExprInspection \
+// RUN:     -analyzer-config eagerly-assume=false \
+// RUN:     -verify=expected,noeagerlyassume %s
+
+// These tests validate the logic within `ExprEngine::processBranch` which
+// ensures that in loops with opaque conditions we don't assume execution paths
+// if the code does not imply that they are possible.
+
+void clang_analyzer_numTimesReached(void);
+void clang_analyzer_warnIfReached(void);
+void clang_analyzer_dump(int);
+
+void clearCondition(void) {
+  // If the analyzer can definitely determine the value of the loop condition,
+  // then this corrective logic doesn't activate and the engine executes
+  // `-analyzer-max-loop` iterations (by default, 4).
+  for (int i = 0; i < 10; i++)
+    clang_analyzer_numTimesReached(); // expected-warning {{4}}
+
+  clang_analyzer_warnIfReached(); // unreachable
+}
+
+void opaqueCondition(int arg) {
+  // If the loop condition is opaque, don't assume more than two iterations,
+  // because the presence of a loop does not imply that the programmer thought
+  // that more than two iterations are possible. (It _does_ imply that two
+  // iterations may be possible at least in some cases, because otherwise an
+  // `if` would've been enough.)
+  for (int i = 0; i < arg; i++)
+    clang_analyzer_numTimesReached(); // expected-warning {{2}}
+
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+}
+
+int check(void);
+
+void opaqueConditionCall(int arg) {
+  // Same situation as `opaqueCondition()` but with a `while ()` loop. This
+  // is also an example for a situation where the programmer cannot easily
+  // insert an assertion to guide the analyzer and rule out more than two
+  // iterations (so the analyzer needs to proactively avoid those unjustified
+  // branches).
+  while (check())
+    clang_analyzer_numTimesReached(); // expected-warning {{2}}
+
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+}
+
+void opaqueConditionDoWhile(int arg) {
+  // Same situation as `opaqueCondition()` but with a `do {} while ()` loop.
+  // This is tested separately because this loop type is a special case in the
+  // iteration count calculation.
+  int i = 0;
+  do {
+    clang_analyzer_numTimesReached(); // expected-warning {{2}}
+  } while (i++ < arg);
+
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+}
+
+void dontRememberOldBifurcation(int arg) {
+  // In this (slightly contrived) test case the analyzer performs an assumption
+  // at the first iteration of the loop, but does not make any new assumptions
+  // in the subsequent iterations, so the analyzer should continue evaluating
+  // the loop.
+  // FIXME: This is mishandled in `eagerly-assume` mode (which is enabled by
+  // default), because `didEagerlyAssumeBifurcateAt()` returns true for the
+  // loop condition -- referring to the bifurcation which happened on an early
+  // iteration.
+
+  // NOTE: The variable `i` is introduced to ensure that the iterations of the
+  // loop change the state -- otherwise the analyzer stops iterating because it
+  // returns to the same `ExplodedNode`.
+  int i = 0;
+  while (arg > 3) {
+    clang_analyzer_numTimesReached(); // noeagerlyassume-warning {{4}} eagerlyassume-warning {{2}}
+    i++;
+  }
+
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+}
+
+void dontAssumeFourthIterartion(int arg) {
+  if (arg == 2)
+    return;
+
+  // In this function the analyzer cannot leave the loop after exactly two
+  // iterations (because it knows that `arg != 2` at that point), so it
+  // performs a third iteration, but it does not assume that a fourth iteration
+  // is also possible.
+  // FIXME: This test case is also affected by the bug described in
+  // `dontRememberOldBifurcation()`.
+  for (int i = 0; i < arg; i++)
+    clang_analyzer_numTimesReached(); // noeagerlyassume-warning {{3}} eagerlyassume-warning {{2}}
+
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+}
+
+#define TRUE 1
+void shortCircuitInLoopCondition(int arg) {
+  // When the loop condition expression contains short-circuiting operators, it
+  // performs "inner" bifurcations for those operators and only considers the
+  // last (rightmost) operand as the branch condition that is associated with
+  // the loop itself (as its loop condition).
+  // This means that assumptions taken in the left-hand side of a short-circuiting
+  // operator are not recognized as "opaque" loop condition, so the loop in
+  // this test case is allowed to finish four iterations.
+  // FIXME: This corner case is responsible for at least one out-of-bounds
+  // false positive on the ffmpeg codebase. Eventually we should properly
+  // recognize the full syntactical loop condition expression as "the loop
+  // condition", but this will be complicated to implement.
+  for (int i = 0; i < arg && TRUE; i++) {
+    clang_analyzer_numTimesReached(); // expected-warning {{4}}
+  }
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+}
+
+void shortCircuitInLoopConditionRHS(int arg) {
+  // Unlike `shortCircuitInLoopCondition()`, this case is handled properly
+  // because the analyzer thinks that the right hand side of the `&&` is the
+  // loop condition.
+  for (int i = 0; TRUE && i < arg; i++) {
+    clang_analyzer_numTimesReached(); // expected-warning {{2}}
+  }
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+}
+
+void eagerlyAssumeInSubexpression(int arg) {
+  // The `EagerlyAssume` logic is another complication that can "split the
+  // state" within the loop condition, but before the `processBranch()` call
+  // which is (in theory) responsible for evaluating the loop condition.
+  // The current implementation partially compensates this by noticing the
+  // cases where the loop condition is targeted by `EagerlyAssume`, but does
+  // not handle the (fortunately rare) case when `EagerlyAssume` hits a
+  // sub-expression of the loop condition (as in this contrived test case).
+  // FIXME: I don't know a real-world example for this inconsistency, but it
+  // would be good to eliminate it eventually.
+  for (int i = 0; (i >= arg) - 1; i++) {
+    clang_analyzer_numTimesReached(); // eagerlyassume-warning {{4}} noeagerlyassume-warning {{2}}
+  }
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+}
+
+void calledTwice(int arg, int isFirstCall) {
+  // This function is called twice (with two different unknown 'arg' values) to
+  // check the iteration count handling in this situation.
+  for (int i = 0; i < arg; i++) {
+    if (isFirstCall) {
+      clang_analyzer_numTimesReached(); // expected-warning {{2}}
+    } else {
+      clang_analyzer_numTimesReached(); // expected-warning {{2}}
+    }
+  }
+}
+
+void caller(int arg, int arg2) {
+  // Entry point for `calledTwice()`.
+  calledTwice(arg, 1);
+  calledTwice(arg2, 0);
+}
+
+void innerLoopClearCondition(void) {
+  // A "control group" test case for the behavior of an inner loop. Notice that
+  // although the (default) value of `-analyzer-max-loop` is 4, we only see 3 iterations
+  // of the inner loop, because `-analyzer-max-loop` limits the number of
+  // evaluations of _the loop condition of the inner loop_ and in addition to
+  // the 3 evaluations before the 3 iterations, there is also a step where it
+  // evaluates to false (in the first iteration of the outer loop).
+  for (int outer = 0; outer < 2; outer++) {
+    int limit = 0;
+    if (outer)
+      limit = 10;
+    clang_analyzer_dump(limit); // expected-warning {{0}} expected-warning {{10}}
+    for (int i = 0; i < limit; i++) {
+      clang_analyzer_numTimesReached(); // expected-warning {{3}}
+    }
+  }
+}
+
+void innerLoopOpaqueCondition(int arg) {
+  // In this test case the engine doesn't assume a second iteration within the
+  // inner loop (in the second iteration of the outer loop, when the limit is
+  // opaque) because `CoreEngine::getCompletedIterationCount()` is based on the
+  // `BlockCount` values queried from the `BlockCounter` which count _all_
+  // evaluations of a given `CFGBlock` (in our case, the loop condition) and
+  // not just the evaluations within the current iteration of the outer loop.
+  // FIXME: This inaccurate iteration count could in theory cause some false
+  // negatives, although I think this would be unusual in practice, as the
+  // small default value of `-analyzer-max-loop` means that this is only
+  // relevant if the analyzer can deduce that the inner loop performs 0 or 1
+  // iterations within the first iteration of the outer loop (and then the
+  // condition of the inner loop is opaque within the second iteration of the
+  // outer loop).
+  for (int outer = 0; outer < 2; outer++) {
+    int limit = 0;
+    if (outer)
+      limit = arg;
+    clang_analyzer_dump(limit); // expected-warning {{0}} expected-warning {{reg_$}}
+    for (int i = 0; i < limit; i++) {
+      clang_analyzer_numTimesReached(); // expected-warning {{1}}
+    }
+  }
+}

>From 41e94d77b3afed7cbfa30c1498fc03995eb27717 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Fri, 20 Dec 2024 16:29:58 +0100
Subject: [PATCH 4/9] Don't apply obsolete EagerlyAssume info

---
 clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | 14 ++++++++------
 clang/test/Analysis/loop-assumptions.c       | 13 +++++--------
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index f8b13618c2e2dd..d36ce8493d3ea3 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -3749,9 +3749,10 @@ ExprEngine::getEagerlyAssumeBifurcationTags() {
   return std::make_pair(&TrueTag, &FalseTag);
 }
 
-/// The last expression where EagerlyAssume produced two transitions (i.e. it
-/// activated and the true and false cases were both feasible).
-REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeBifurcationAt, const Expr *)
+/// If the last EagerlyAssume attempt was successful (i.e. the true and false
+/// cases were both feasible), this state trait stores the expression where it
+/// happened; otherwise this holds nullptr.
+REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeExprIfSuccessful, const Expr *)
 
 void ExprEngine::evalEagerlyAssumeBifurcation(ExplodedNodeSet &Dst,
                                               ExplodedNodeSet &Src,
@@ -3768,6 +3769,7 @@ void ExprEngine::evalEagerlyAssumeBifurcation(ExplodedNodeSet &Dst,
     }
 
     ProgramStateRef State = Pred->getState();
+    State = State->set<LastEagerlyAssumeExprIfSuccessful>(nullptr);
     SVal V = State->getSVal(Ex, Pred->getLocationContext());
     std::optional<nonloc::SymbolVal> SEV = V.getAs<nonloc::SymbolVal>();
     if (SEV && SEV->isExpression()) {
@@ -3776,8 +3778,8 @@ void ExprEngine::evalEagerlyAssumeBifurcation(ExplodedNodeSet &Dst,
       auto [StateTrue, StateFalse] = State->assume(*SEV);
 
       if (StateTrue && StateFalse) {
-        StateTrue = StateTrue->set<LastEagerlyAssumeBifurcationAt>(Ex);
-        StateFalse = StateFalse->set<LastEagerlyAssumeBifurcationAt>(Ex);
+        StateTrue = StateTrue->set<LastEagerlyAssumeExprIfSuccessful>(Ex);
+        StateFalse = StateFalse->set<LastEagerlyAssumeExprIfSuccessful>(Ex);
       }
 
       // First assume that the condition is true.
@@ -3799,7 +3801,7 @@ void ExprEngine::evalEagerlyAssumeBifurcation(ExplodedNodeSet &Dst,
 
 bool ExprEngine::didEagerlyAssumeBifurcateAt(ProgramStateRef State,
                                              const Expr *Ex) const {
-  return Ex && State->get<LastEagerlyAssumeBifurcationAt>() == Ex;
+  return Ex && State->get<LastEagerlyAssumeExprIfSuccessful>() == Ex;
 }
 
 void ExprEngine::VisitGCCAsmStmt(const GCCAsmStmt *A, ExplodedNode *Pred,
diff --git a/clang/test/Analysis/loop-assumptions.c b/clang/test/Analysis/loop-assumptions.c
index 064ce2bf381fc6..92b4c6d7fff1bd 100644
--- a/clang/test/Analysis/loop-assumptions.c
+++ b/clang/test/Analysis/loop-assumptions.c
@@ -65,17 +65,16 @@ void dontRememberOldBifurcation(int arg) {
   // at the first iteration of the loop, but does not make any new assumptions
   // in the subsequent iterations, so the analyzer should continue evaluating
   // the loop.
-  // FIXME: This is mishandled in `eagerly-assume` mode (which is enabled by
-  // default), because `didEagerlyAssumeBifurcateAt()` returns true for the
-  // loop condition -- referring to the bifurcation which happened on an early
-  // iteration.
+  // Previously this was mishandled in `eagerly-assume` mode (which is enabled
+  // by default), because the code remembered that there was a bifurcation on
+  // the first iteration of the loop and didn't realize that this is obsolete.
 
   // NOTE: The variable `i` is introduced to ensure that the iterations of the
   // loop change the state -- otherwise the analyzer stops iterating because it
   // returns to the same `ExplodedNode`.
   int i = 0;
   while (arg > 3) {
-    clang_analyzer_numTimesReached(); // noeagerlyassume-warning {{4}} eagerlyassume-warning {{2}}
+    clang_analyzer_numTimesReached(); // expected-warning {{4}}
     i++;
   }
 
@@ -90,10 +89,8 @@ void dontAssumeFourthIterartion(int arg) {
   // iterations (because it knows that `arg != 2` at that point), so it
   // performs a third iteration, but it does not assume that a fourth iteration
   // is also possible.
-  // FIXME: This test case is also affected by the bug described in
-  // `dontRememberOldBifurcation()`.
   for (int i = 0; i < arg; i++)
-    clang_analyzer_numTimesReached(); // noeagerlyassume-warning {{3}} eagerlyassume-warning {{2}}
+    clang_analyzer_numTimesReached(); // expected-warning {{3}}
 
   clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
 }

>From 739fb0f845e813a2b174af1cf632056751d1db80 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Fri, 20 Dec 2024 16:30:58 +0100
Subject: [PATCH 5/9] Satisfy git-clang-format

---
 clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index d36ce8493d3ea3..b1c2a4be7d7426 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -3752,7 +3752,8 @@ ExprEngine::getEagerlyAssumeBifurcationTags() {
 /// If the last EagerlyAssume attempt was successful (i.e. the true and false
 /// cases were both feasible), this state trait stores the expression where it
 /// happened; otherwise this holds nullptr.
-REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeExprIfSuccessful, const Expr *)
+REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeExprIfSuccessful,
+                                 const Expr *)
 
 void ExprEngine::evalEagerlyAssumeBifurcation(ExplodedNodeSet &Dst,
                                               ExplodedNodeSet &Src,

>From 1e28c6fd67ae5d347f3a0fe9ba96832c0a481a4a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Mon, 23 Dec 2024 10:19:33 +0100
Subject: [PATCH 6/9] Eliminate end-of-line spaces from test file

---
 clang/test/Analysis/loop-assumptions.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/clang/test/Analysis/loop-assumptions.c b/clang/test/Analysis/loop-assumptions.c
index 92b4c6d7fff1bd..1b5993a0af847e 100644
--- a/clang/test/Analysis/loop-assumptions.c
+++ b/clang/test/Analysis/loop-assumptions.c
@@ -31,7 +31,7 @@ void opaqueCondition(int arg) {
   for (int i = 0; i < arg; i++)
     clang_analyzer_numTimesReached(); // expected-warning {{2}}
 
-  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}
 }
 
 int check(void);
@@ -45,7 +45,7 @@ void opaqueConditionCall(int arg) {
   while (check())
     clang_analyzer_numTimesReached(); // expected-warning {{2}}
 
-  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}
 }
 
 void opaqueConditionDoWhile(int arg) {
@@ -57,7 +57,7 @@ void opaqueConditionDoWhile(int arg) {
     clang_analyzer_numTimesReached(); // expected-warning {{2}}
   } while (i++ < arg);
 
-  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}
 }
 
 void dontRememberOldBifurcation(int arg) {
@@ -78,7 +78,7 @@ void dontRememberOldBifurcation(int arg) {
     i++;
   }
 
-  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}
 }
 
 void dontAssumeFourthIterartion(int arg) {
@@ -92,7 +92,7 @@ void dontAssumeFourthIterartion(int arg) {
   for (int i = 0; i < arg; i++)
     clang_analyzer_numTimesReached(); // expected-warning {{3}}
 
-  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}
 }
 
 #define TRUE 1
@@ -111,7 +111,7 @@ void shortCircuitInLoopCondition(int arg) {
   for (int i = 0; i < arg && TRUE; i++) {
     clang_analyzer_numTimesReached(); // expected-warning {{4}}
   }
-  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}
 }
 
 void shortCircuitInLoopConditionRHS(int arg) {
@@ -121,7 +121,7 @@ void shortCircuitInLoopConditionRHS(int arg) {
   for (int i = 0; TRUE && i < arg; i++) {
     clang_analyzer_numTimesReached(); // expected-warning {{2}}
   }
-  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}
 }
 
 void eagerlyAssumeInSubexpression(int arg) {
@@ -137,7 +137,7 @@ void eagerlyAssumeInSubexpression(int arg) {
   for (int i = 0; (i >= arg) - 1; i++) {
     clang_analyzer_numTimesReached(); // eagerlyassume-warning {{4}} noeagerlyassume-warning {{2}}
   }
-  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} 
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}
 }
 
 void calledTwice(int arg, int isFirstCall) {

>From 78d1f32747bd64aa959a99e15a6aa54d74f5bd4b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Mon, 23 Dec 2024 16:06:24 +0100
Subject: [PATCH 7/9] Add a test to clarify that we only deal with loop
 conditions

---
 clang/test/Analysis/loop-assumptions.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/clang/test/Analysis/loop-assumptions.c b/clang/test/Analysis/loop-assumptions.c
index 1b5993a0af847e..eb0ffdce722e0c 100644
--- a/clang/test/Analysis/loop-assumptions.c
+++ b/clang/test/Analysis/loop-assumptions.c
@@ -200,3 +200,20 @@ void innerLoopOpaqueCondition(int arg) {
     }
   }
 }
+
+void onlyLoopConditions(int arg) {
+  // This "don't assume third iteration" logic only examines the conditions of
+  // loop statements and does not affect the analysis of code that implements
+  // similar behavior with different language features like if + break, goto,
+  // recursive functions, ...
+  int i = 0;
+  while (1) {
+    clang_analyzer_numTimesReached(); // expected-warning {{4}}
+
+    // This is not a loop condition.
+    if (i++ > arg)
+      break;
+  }
+
+  clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}}
+}

>From 31f0d4d3e9b3562824c48324f80a70f0884835f5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Thu, 2 Jan 2025 15:11:46 +0100
Subject: [PATCH 8/9] Minor grammar fix

Co-authored-by: Balazs Benics <benicsbalazs at gmail.com>
---
 clang/test/Analysis/misc-ps-region-store.m | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clang/test/Analysis/misc-ps-region-store.m b/clang/test/Analysis/misc-ps-region-store.m
index b0a04b836efb99..a882e7eb0dc909 100644
--- a/clang/test/Analysis/misc-ps-region-store.m
+++ b/clang/test/Analysis/misc-ps-region-store.m
@@ -910,7 +910,7 @@ void pr6302(id x, Class y) {
 
 //===----------------------------------------------------------------------===//
 // Specially handle global variables that are declared constant.  In the
-// example below, this forces the loop to take exactly 1 iterations.
+// example below, this forces the loop to take exactly 1 iteration.
 //===----------------------------------------------------------------------===//
 
 const int pr6288_L_N = 1;

>From 0a143882a262570a83b9e0fdda1ad8e9ba518e0a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Thu, 2 Jan 2025 15:44:05 +0100
Subject: [PATCH 9/9] Add a paragraph to the release notes

---
 clang/docs/ReleaseNotes.rst | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 08887d5ba985f5..bae35521ddc56e 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -1007,6 +1007,13 @@ New features
 Crash and bug fixes
 ^^^^^^^^^^^^^^^^^^^
 
+- In loops where the loop condition is opaque (i.e. the analyzer cannot
+  determine whether it's true or false), the analyzer will no longer assume
+  execution paths that perform more that two iterations. These unjustified
+  assumptions caused false positive reports (e.g. 100+ out-of-bounds reports in
+  the FFMPEG codebase) in loops where the programmer intended only two or three
+  steps but the analyzer wasn't able to understand that the loop is limited.
+
 Improvements
 ^^^^^^^^^^^^
 



More information about the cfe-commits mailing list