[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

DonĂ¡t Nagy via cfe-commits cfe-commits at lists.llvm.org
Tue Oct 1 06:14:48 PDT 2024


https://github.com/NagyDonat updated https://github.com/llvm/llvm-project/pull/109804

>From 23b27377e556085054621f27d97059618b416695 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Mon, 23 Sep 2024 15:42:20 +0200
Subject: [PATCH 1/6] [analyzer] Suppress out of bounds reports after weak loop
 assumptions

The checker alpha.security.ArrayBoundV2 produced lots of false positives
in situations where loop modeling of the engine fed it with unfounded
assumptions.

This commit introduces a heuristic that discards ArrayBoundV2 reports
when the execution path introduces an assumption that is questionable.
More precisely, two kinds of assumptions are categorized as "weak":
(1) When the analyzer assumes that the first evaluation of the loop
    condition returns false and the loop body is completely skipped.
(2) When the analyzer assumes that the loop condition is true in a
    situation where it already executed (at least) two iterations.
For examples and more explanation, see the new tests.

The actual implementation uses some approximations (it uses the
BlockCount instead of the iteration count) because that seems to be
"good enough" for this heuristical suppression.

Note that I used minor state updates instead of bug reporter visitors
because the number of finished iterations is not visible in the visitor
which "walks backwards in time".

As a very minor unrelated change, this commit removes the "Bin" part
from the method name "evalEagerlyAssumeBinOpBifurcation" because this
method is also used for the unary logical not operator.
---
 .../Core/PathSensitive/ExprEngine.h           |  43 ++++++--
 .../Checkers/ArrayBoundCheckerV2.cpp          |   5 +
 clang/lib/StaticAnalyzer/Core/CoreEngine.cpp  |  25 ++++-
 clang/lib/StaticAnalyzer/Core/ExprEngine.cpp  |  83 +++++++++++---
 clang/test/Analysis/loop-unrolling.cpp        |   6 +-
 clang/test/Analysis/out-of-bounds.c           | 101 ++++++++++++++++++
 6 files changed, 233 insertions(+), 30 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
index 04eacd1df7ffe2..03e4fec2b49015 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
@@ -121,6 +121,25 @@ struct EvalCallOptions {
   EvalCallOptions() {}
 };
 
+/// Simple control flow statements like `if` only produce a single state split,
+/// so the fact that they are included in the source code implies that both
+/// branches are possible (at least under some conditions) and the analyzer can
+/// freely assume either of them. (This is not entirely true, because there may
+/// be unmarked logical correlations between `if` statements, but is a good
+/// enough heuristic and the analyzer strongly relies on it.)
+/// On the other hand, in a loop the state may be split repeatedly at each
+/// evaluation of the loop condition, and this can lead to following "weak"
+/// assumptions even though the code does not imply that they're valid and the
+/// programmer intended to cover them.
+/// This function is called to mark the `State` when the engine makes an
+/// assumption which is weak. Checkers may use this heuristical mark to discard
+/// result and reduce the amount of false positives.
+ProgramStateRef recordWeakLoopAssumption(ProgramStateRef State);
+
+/// Returns true if `recordWeakLoopAssumption()` was called on the execution
+/// path which produced `State`.
+bool seenWeakLoopAssumption(ProgramStateRef State);
+
 class ExprEngine {
   void anchor();
 
@@ -323,12 +342,13 @@ class ExprEngine {
 
   /// ProcessBranch - Called by CoreEngine.  Used to generate successor
   ///  nodes by processing the 'effects' of a branch condition.
-  void processBranch(const Stmt *Condition,
-                     NodeBuilderContext& BuilderCtx,
-                     ExplodedNode *Pred,
-                     ExplodedNodeSet &Dst,
-                     const CFGBlock *DstT,
-                     const CFGBlock *DstF);
+  /// If the branch condition is a loop condition, IterationsFinishedInLoop is
+  /// the number of already finished iterations (0, 1, 2...); otherwise it's
+  /// std::nullopt.
+  void processBranch(const Stmt *Condition, NodeBuilderContext &BuilderCtx,
+                     ExplodedNode *Pred, ExplodedNodeSet &Dst,
+                     const CFGBlock *DstT, const CFGBlock *DstF,
+                     std::optional<unsigned> IterationsFinishedInLoop);
 
   /// Called by CoreEngine.
   /// Used to generate successor nodes for temporary destructors depending
@@ -583,11 +603,12 @@ class ExprEngine {
                                 ExplodedNode *Pred,
                                 ExplodedNodeSet &Dst);
 
-  /// evalEagerlyAssumeBinOpBifurcation - Given the nodes in 'Src', eagerly assume symbolic
-  ///  expressions of the form 'x != 0' and generate new nodes (stored in Dst)
-  ///  with those assumptions.
-  void evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst, ExplodedNodeSet &Src,
-                         const Expr *Ex);
+  /// evalEagerlyAssumeOpBifurcation - Given the nodes in 'Src', eagerly assume
+  /// symbolic
+  ///  expressions of the form 'x != 0' or '!x' and generate new nodes (stored
+  ///  in Dst) with those assumptions.
+  void evalEagerlyAssumeOpBifurcation(ExplodedNodeSet &Dst,
+                                      ExplodedNodeSet &Src, const Expr *Ex);
 
   static std::pair<const ProgramPointTag *, const ProgramPointTag *>
     geteagerlyAssumeBinOpBifurcationTags();
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index 3f837564cf47c4..da9ae1c749a3db 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -697,6 +697,11 @@ void ArrayBoundCheckerV2::reportOOB(CheckerContext &C,
                                     ProgramStateRef ErrorState, Messages Msgs,
                                     NonLoc Offset, std::optional<NonLoc> Extent,
                                     bool IsTaintBug /*=false*/) const {
+  // Suppress results found through execution paths where in some loop the
+  // analyzer arbitrarily assumed either that the loop is skipped (0 iterations)
+  // or that 3 or more iterations are executed.
+  if (seenWeakLoopAssumption(ErrorState))
+    return;
 
   ExplodedNode *ErrorNode = C.generateErrorNode(ErrorState);
   if (!ErrorNode)
diff --git a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
index 8605fa149e4f52..b5e6b3c1bcb471 100644
--- a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
@@ -441,10 +441,33 @@ void CoreEngine::HandleCallEnter(const CallEnter &CE, ExplodedNode *Pred) {
 void CoreEngine::HandleBranch(const Stmt *Cond, const Stmt *Term,
                                 const CFGBlock * B, ExplodedNode *Pred) {
   assert(B->succ_size() == 2);
+
+  const LocationContext *LC = Pred->getLocationContext();
+  BlockCounter Counter = WList->getBlockCounter();
+  unsigned BlockCount =
+      Counter.getNumVisited(LC->getStackFrame(), B->getBlockID());
+  std::optional<unsigned> IterationsFinishedInLoop = std::nullopt;
+  if (isa<ForStmt, WhileStmt, CXXForRangeStmt>(Term)) {
+    // FIXME: This code approximates the number of finished iteration based on
+    // the block count, i.e. the number of evaluations of the terminator block
+    // on the current execution path (which includes the current evaluation, so
+    // is always at least 1). This is probably acceptable for the
+    // checker-specific false positive suppression that currently uses this
+    // value, but it would be better to calcuate an accurate count of
+    // iterations.
+    assert(BlockCount >= 1);
+    IterationsFinishedInLoop = BlockCount - 1;
+  } else if (isa<DoStmt>(Term)) {
+    // FIXME: The fixme note in the previous branch also applies here.
+    // In a do-while loop one iteration happens before the first evaluation of
+    // the loop condition, so we don't subtract one from the block count.
+    IterationsFinishedInLoop = BlockCount;
+  }
+
   NodeBuilderContext Ctx(*this, B, Pred);
   ExplodedNodeSet Dst;
   ExprEng.processBranch(Cond, Ctx, Pred, Dst, *(B->succ_begin()),
-                       *(B->succ_begin() + 1));
+                        *(B->succ_begin() + 1), IterationsFinishedInLoop);
   // Enqueue the new frontier onto the worklist.
   enqueue(Dst);
 }
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index fdabba46992b08..db52b777a02a89 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -212,6 +212,20 @@ typedef llvm::ImmutableMap<const LocationContext *, unsigned>
 REGISTER_TRAIT_WITH_PROGRAMSTATE(PendingArrayDestruction,
                                  PendingArrayDestructionMap)
 
+// This trait is used to heuristically filter out results produced from
+// execution paths that took "weak" assumptions within a loop.
+REGISTER_TRAIT_WITH_PROGRAMSTATE(SeenWeakLoopAssumption, bool)
+
+ProgramStateRef clang::ento::recordWeakLoopAssumption(ProgramStateRef State) {
+  return State->set<SeenWeakLoopAssumption>(true);
+}
+
+bool clang::ento::seenWeakLoopAssumption(ProgramStateRef State) {
+  return State->get<SeenWeakLoopAssumption>();
+}
+
+REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeAssumptionAt, const Expr *)
+
 //===----------------------------------------------------------------------===//
 // Engine construction and deletion.
 //===----------------------------------------------------------------------===//
@@ -2128,7 +2142,7 @@ void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred,
           (B->isRelationalOp() || B->isEqualityOp())) {
         ExplodedNodeSet Tmp;
         VisitBinaryOperator(cast<BinaryOperator>(S), Pred, Tmp);
-        evalEagerlyAssumeBinOpBifurcation(Dst, Tmp, cast<Expr>(S));
+        evalEagerlyAssumeOpBifurcation(Dst, Tmp, cast<Expr>(S));
       }
       else
         VisitBinaryOperator(cast<BinaryOperator>(S), Pred, Dst);
@@ -2401,7 +2415,7 @@ void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred,
       if (AMgr.options.ShouldEagerlyAssume && (U->getOpcode() == UO_LNot)) {
         ExplodedNodeSet Tmp;
         VisitUnaryOperator(U, Pred, Tmp);
-        evalEagerlyAssumeBinOpBifurcation(Dst, Tmp, U);
+        evalEagerlyAssumeOpBifurcation(Dst, Tmp, U);
       }
       else
         VisitUnaryOperator(U, Pred, Dst);
@@ -2761,12 +2775,10 @@ assumeCondition(const Stmt *Condition, ExplodedNode *N) {
   return State->assume(V);
 }
 
-void ExprEngine::processBranch(const Stmt *Condition,
-                               NodeBuilderContext& BldCtx,
-                               ExplodedNode *Pred,
-                               ExplodedNodeSet &Dst,
-                               const CFGBlock *DstT,
-                               const CFGBlock *DstF) {
+void ExprEngine::processBranch(
+    const Stmt *Condition, NodeBuilderContext &BldCtx, ExplodedNode *Pred,
+    ExplodedNodeSet &Dst, const CFGBlock *DstT, const CFGBlock *DstF,
+    std::optional<unsigned> IterationsFinishedInLoop) {
   assert((!Condition || !isa<CXXBindTemporaryExpr>(Condition)) &&
          "CXXBindTemporaryExprs are handled by processBindTemporary.");
   const LocationContext *LCtx = Pred->getLocationContext();
@@ -2808,6 +2820,9 @@ void ExprEngine::processBranch(const Stmt *Condition,
       std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
     else {
       assert(!isa<ObjCForCollectionStmt>(Condition));
+      // TODO: instead of this shortcut perhaps it would be better to "rejoin"
+      // the common execution path with
+      // StTrue = StFalse = PrevState;
       builder.generateNode(PrevState, true, PredN);
       builder.generateNode(PrevState, false, PredN);
       continue;
@@ -2815,20 +2830,53 @@ void ExprEngine::processBranch(const Stmt *Condition,
     if (StTrue && StFalse)
       assert(!isa<ObjCForCollectionStmt>(Condition));
 
+    const Expr *EagerlyAssumeExpr =
+        PrevState->get<LastEagerlyAssumeAssumptionAt>();
+    const Expr *ConditionExpr = dyn_cast<Expr>(Condition);
+    if (ConditionExpr)
+      ConditionExpr = ConditionExpr->IgnoreParenCasts();
+    bool DidEagerlyAssume = EagerlyAssumeExpr == ConditionExpr;
+    bool BothFeasible = (DidEagerlyAssume || (StTrue && StFalse)) &&
+                        builder.isFeasible(true) && builder.isFeasible(false);
+
     // Process the true branch.
     if (builder.isFeasible(true)) {
-      if (StTrue)
+      if (StTrue) {
+        if (BothFeasible && IterationsFinishedInLoop &&
+            *IterationsFinishedInLoop >= 2) {
+          // When programmers write a loop, they imply that at least two
+          // iterations are possible (otherwise they would just write an `if`),
+          // but the third iteration is not implied: there are situations where
+          // the programmer knows that there won't be a third iteration (e.g.
+          // they iterate over a structure that has <= 2 elements) but this is
+          // not marked in the source code.
+          // Checkers may use this heuristic mark to discard results found on
+          // branches that contain this "weak" assumption.
+          StTrue = recordWeakLoopAssumption(StTrue);
+        }
         builder.generateNode(StTrue, true, PredN);
-      else
+      } else {
         builder.markInfeasible(true);
+      }
     }
 
     // Process the false branch.
     if (builder.isFeasible(false)) {
-      if (StFalse)
+      if (StFalse) {
+        if (BothFeasible && IterationsFinishedInLoop &&
+            *IterationsFinishedInLoop == 0) {
+          // There are many situations where the programmers know that there
+          // will be at least one iteration in a loop (e.g. a structure is not
+          // empty) but the analyzer cannot deduce this and reports false
+          // positives after skipping the loop.
+          // Checkers may use this heuristic mark to discard results found on
+          // branches that contain this "weak" assumption.
+          StFalse = recordWeakLoopAssumption(StFalse);
+        }
         builder.generateNode(StFalse, false, PredN);
-      else
+      } else {
         builder.markInfeasible(false);
+      }
     }
   }
   currBldrCtx = nullptr;
@@ -3752,9 +3800,9 @@ ExprEngine::geteagerlyAssumeBinOpBifurcationTags() {
                         &eagerlyAssumeBinOpBifurcationFalse);
 }
 
-void ExprEngine::evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst,
-                                                   ExplodedNodeSet &Src,
-                                                   const Expr *Ex) {
+void ExprEngine::evalEagerlyAssumeOpBifurcation(ExplodedNodeSet &Dst,
+                                                ExplodedNodeSet &Src,
+                                                const Expr *Ex) {
   StmtNodeBuilder Bldr(Src, Dst, *currBldrCtx);
 
   for (const auto Pred : Src) {
@@ -3776,6 +3824,11 @@ void ExprEngine::evalEagerlyAssumeBinOpBifurcation(ExplodedNodeSet &Dst,
       ProgramStateRef StateTrue, StateFalse;
       std::tie(StateTrue, StateFalse) = state->assume(*SEV);
 
+      if (StateTrue && StateFalse) {
+        StateTrue = StateTrue->set<LastEagerlyAssumeAssumptionAt>(Ex);
+        StateFalse = StateFalse->set<LastEagerlyAssumeAssumptionAt>(Ex);
+      }
+
       // First assume that the condition is true.
       if (StateTrue) {
         SVal Val = svalBuilder.makeIntVal(1U, Ex->getType());
diff --git a/clang/test/Analysis/loop-unrolling.cpp b/clang/test/Analysis/loop-unrolling.cpp
index 66a828abfb5133..1d58ba171c0856 100644
--- a/clang/test/Analysis/loop-unrolling.cpp
+++ b/clang/test/Analysis/loop-unrolling.cpp
@@ -349,7 +349,7 @@ int simple_unknown_bound_loop() {
 #ifdef DFS
     clang_analyzer_numTimesReached(); // expected-warning {{16}}
 #else
-    clang_analyzer_numTimesReached(); // expected-warning {{8}}
+    clang_analyzer_numTimesReached(); // expected-warning {{10}}
 #endif
   }
   return 0;
@@ -369,9 +369,9 @@ int nested_inlined_no_unroll1() {
   int k;
   for (int i = 0; i < 9; i++) {
 #ifdef DFS
-    clang_analyzer_numTimesReached(); // expected-warning {{18}}
+    clang_analyzer_numTimesReached(); // expected-warning {{20}}
 #else
-    clang_analyzer_numTimesReached(); // expected-warning {{14}}
+    clang_analyzer_numTimesReached(); // expected-warning {{18}}
 #endif
     k = simple_unknown_bound_loop();  // reevaluation without inlining, splits the state as well
   }
diff --git a/clang/test/Analysis/out-of-bounds.c b/clang/test/Analysis/out-of-bounds.c
index 1f771c2b3bd138..6380e72543bb0c 100644
--- a/clang/test/Analysis/out-of-bounds.c
+++ b/clang/test/Analysis/out-of-bounds.c
@@ -1,4 +1,9 @@
 // RUN: %clang_analyze_cc1 -Wno-array-bounds -analyzer-checker=core,alpha.security.ArrayBoundV2,debug.ExprInspection -verify %s
+// RUN: %clang_analyze_cc1 -Wno-array-bounds -analyzer-checker=core,alpha.security.ArrayBoundV2,debug.ExprInspection -analyzer-config eagerly-assume=false -verify %s
+
+// Note that eagerly-assume=false is tested separately because the
+// WeakLoopAssumption suppression heuristic uses different code paths to
+// achieve the same result with and without eagerly-assume.
 
 void clang_analyzer_eval(int);
 
@@ -194,3 +199,99 @@ char test_comparison_with_extent_symbol(struct incomplete *p) {
   return ((char *)p)[-1]; // no-warning
 }
 
+// WeakLoopAssumption suppression
+///////////////////////////////////////////////////////////////////////
+
+int GlobalArray[100];
+int loop_suppress_after_zero_iterations(unsigned len) {
+  for (unsigned i = 0; i < len; i++)
+    if (GlobalArray[i] > 0)
+      return GlobalArray[i];
+  // Previously this would have produced an overflow warning because splitting
+  // the state on the loop condition introduced an execution path where the
+  // analyzer thinks that len == 0.
+  // There are very many situations where the programmer knows that an argument
+  // is positive, but this is not indicated in the source code, so we must
+  // avoid reporting errors (especially out of bounds errors) on these
+  // branches, because otherwise we'd get prohibitively many false positives.
+  return GlobalArray[len - 1]; // no-warning
+}
+
+void loop_report_in_second_iteration(int len) {
+  int buf[1] = {0};
+  for (int i = 0; i < len; i++) {
+    // When a programmer writes a loop, we may assume that they intended at
+    // least two iterations.
+    buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  }
+}
+
+void loop_suppress_in_third_iteration(int len) {
+  int buf[2] = {0};
+  for (int i = 0; i < len; i++) {
+    // We should suppress array bounds errors on the third and later iterations
+    // of loops, because sometimes programmers write a loop in sitiuations
+    // where they know that there will be at most two iterations.
+    buf[i] = 1; // no-warning
+  }
+}
+
+void loop_suppress_in_third_iteration_cast(int len) {
+  int buf[2] = {0};
+  for (int i = 0; (unsigned)(i < len); i++) {
+    // Check that a (somewhat arbitrary) cast does not hinder the recognition
+    // of the condition expression.
+    buf[i] = 1; // no-warning
+  }
+}
+
+void loop_suppress_in_third_iteration_logical_and(int len, int flag) {
+  int buf[2] = {0};
+  for (int i = 0; i < len && flag; i++) {
+    // FIXME: In this case the checker should suppress the warning the same way
+    // as it's suppressed in loop_suppress_in_third_iteration, but the
+    // suppression is not activated because the terminator statement associated
+    // with the loop is just the expression 'flag', while 'i < len' is a
+    // separate terminator statement that's associated with the
+    // short-circuiting operator '&&'.
+    // I have seen a real-world FP that looks like this, but it is much rarer
+    // than the basic setup.
+    buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  }
+}
+
+void loop_suppress_in_third_iteration_logical_and_2(int len, int flag) {
+  int buf[2] = {0};
+  for (int i = 0; flag && i < len; i++) {
+    // If the two operands of '&&' are flipped, the suppression works.
+    buf[i] = 1; // no-warning
+  }
+}
+
+int coinflip(void);
+int do_while_report_after_one_iteration(void) {
+  int i = 0;
+  do {
+    i++;
+  } while (coinflip());
+  // Unlike `loop_suppress_after_zero_iterations`, running just one iteration
+  // in a do-while is not a corner case that would produce too many false
+  // positives, so don't suppress bounds errors in these situations.
+  return GlobalArray[i-2]; // expected-warning{{Out of bound access to memory}}
+}
+
+void do_while_report_in_second_iteration(int len) {
+  int buf[1] = {0};
+  int i = 0;
+  do {
+    buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  } while (i++ < len);
+}
+
+void do_while_suppress_in_third_iteration(int len) {
+  int buf[2] = {0};
+  int i = 0;
+  do {
+    buf[i] = 1; // no-warning
+  } while (i++ < len);
+}

>From 77fb22749cb050365af0c0c0cd2cfb7c8f6a13dc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Tue, 24 Sep 2024 16:25:07 +0200
Subject: [PATCH 2/6] Improve comments

---
 .../clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h     | 5 ++---
 clang/lib/StaticAnalyzer/Core/ExprEngine.cpp                 | 5 +++++
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
index 03e4fec2b49015..0361ce5515a868 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
@@ -604,9 +604,8 @@ class ExprEngine {
                                 ExplodedNodeSet &Dst);
 
   /// evalEagerlyAssumeOpBifurcation - Given the nodes in 'Src', eagerly assume
-  /// symbolic
-  ///  expressions of the form 'x != 0' or '!x' and generate new nodes (stored
-  ///  in Dst) with those assumptions.
+  /// symbolic expressions of the form 'x != 0' or '!x' and generate new nodes
+  /// (stored in Dst) with those assumptions.
   void evalEagerlyAssumeOpBifurcation(ExplodedNodeSet &Dst,
                                       ExplodedNodeSet &Src, const Expr *Ex);
 
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index db52b777a02a89..94a772de7f466a 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -224,6 +224,11 @@ bool clang::ento::seenWeakLoopAssumption(ProgramStateRef State) {
   return State->get<SeenWeakLoopAssumption>();
 }
 
+// This trait points to the last expression (logical operator) where an eager
+// assumption introduced a state split (i.e. both cases were feasible). This is
+// used by the WeakLoopAssumption heuristic to find situations where the an
+// eager assumption introduces a state split within the evaluation of a loop
+// condition.
 REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeAssumptionAt, const Expr *)
 
 //===----------------------------------------------------------------------===//

>From 8ae4b67a2493ee60d81c459d2b96c4b989a04cb3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Mon, 30 Sep 2024 17:11:32 +0200
Subject: [PATCH 3/6] Improve the comments

This commit cleans up some typos thet were reported by the reviewers and
tries to provide better explanations for some parts of the patch that
turned out to be confusing.
---
 .../Core/PathSensitive/ExprEngine.h           | 39 ++++++++++++-------
 clang/lib/StaticAnalyzer/Core/CoreEngine.cpp  |  9 ++---
 clang/lib/StaticAnalyzer/Core/ExprEngine.cpp  | 19 +++++----
 clang/test/Analysis/out-of-bounds.c           | 16 +++++---
 4 files changed, 50 insertions(+), 33 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
index 0361ce5515a868..2625e94d07d4a8 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
@@ -121,19 +121,28 @@ struct EvalCallOptions {
   EvalCallOptions() {}
 };
 
-/// Simple control flow statements like `if` only produce a single state split,
-/// so the fact that they are included in the source code implies that both
-/// branches are possible (at least under some conditions) and the analyzer can
-/// freely assume either of them. (This is not entirely true, because there may
-/// be unmarked logical correlations between `if` statements, but is a good
-/// enough heuristic and the analyzer strongly relies on it.)
-/// On the other hand, in a loop the state may be split repeatedly at each
-/// evaluation of the loop condition, and this can lead to following "weak"
-/// assumptions even though the code does not imply that they're valid and the
-/// programmer intended to cover them.
-/// This function is called to mark the `State` when the engine makes an
-/// assumption which is weak. Checkers may use this heuristical mark to discard
-/// result and reduce the amount of false positives.
+/// Simple control flow statements like `if` can only produce a single two-way
+/// state split, so when the analyzer cannot determine the value of the
+/// condition, it can assume either of the two options, because the fact that
+/// they are in the source code implies that the programmer thought that they
+/// are possible (at least under some conditions).
+/// (Note that this heuristic is not entirely correct when there are _several_
+/// `if` statements with unmarked logical connections between them, but it's
+/// still good enough and the analyzer heavily relies on it.)
+/// In contrast with this, a single loop statement can produce multiple state
+/// splits, and we cannot always single out safe assumptions where we can say
+/// that "the programmer included this loop in the source code, so they clearly
+/// thought that this execution path is possible".
+/// However, the analyzer wants to explore the code in and after the loop, so
+/// it makes assumptions about the loop condition (to get a concrete execution
+/// path) even when they are not justified.
+/// This function is called by the engine to mark the `State` when it makes an
+/// assumption which is "weak". Checkers may use this heuristical mark to
+/// discard the result and reduce the amount of false positives.
+/// TODO: Instead of just marking these branches for checker-specific handling,
+/// we could discard them completely. I suspect that this could eliminate some
+/// false positives without suppressing too many true positives, but I didn't
+/// have time to measure its effects.
 ProgramStateRef recordWeakLoopAssumption(ProgramStateRef State);
 
 /// Returns true if `recordWeakLoopAssumption()` was called on the execution
@@ -341,9 +350,9 @@ class ExprEngine {
                                ExplodedNode *Pred);
 
   /// ProcessBranch - Called by CoreEngine.  Used to generate successor
-  ///  nodes by processing the 'effects' of a branch condition.
+  /// nodes by processing the 'effects' of a branch condition.
   /// If the branch condition is a loop condition, IterationsFinishedInLoop is
-  /// the number of already finished iterations (0, 1, 2...); otherwise it's
+  /// the number of already finished iterations (0, 1, 2, ...); otherwise it's
   /// std::nullopt.
   void processBranch(const Stmt *Condition, NodeBuilderContext &BuilderCtx,
                      ExplodedNode *Pred, ExplodedNodeSet &Dst,
diff --git a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
index b5e6b3c1bcb471..54a6f87dcc8be5 100644
--- a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp
@@ -448,13 +448,12 @@ void CoreEngine::HandleBranch(const Stmt *Cond, const Stmt *Term,
       Counter.getNumVisited(LC->getStackFrame(), B->getBlockID());
   std::optional<unsigned> IterationsFinishedInLoop = std::nullopt;
   if (isa<ForStmt, WhileStmt, CXXForRangeStmt>(Term)) {
-    // FIXME: This code approximates the number of finished iteration based on
+    // FIXME: This code approximates the number of finished iterations based on
     // the block count, i.e. the number of evaluations of the terminator block
     // on the current execution path (which includes the current evaluation, so
-    // is always at least 1). This is probably acceptable for the
-    // checker-specific false positive suppression that currently uses this
-    // value, but it would be better to calcuate an accurate count of
-    // iterations.
+    // is always >= 1). This is probably acceptable for the checker-specific
+    // false positive suppression that currently uses this value, but it would
+    // be better to calcuate an accurate count of iterations.
     assert(BlockCount >= 1);
     IterationsFinishedInLoop = BlockCount - 1;
   } else if (isa<DoStmt>(Term)) {
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index 94a772de7f466a..1a83894f134095 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -226,9 +226,8 @@ bool clang::ento::seenWeakLoopAssumption(ProgramStateRef State) {
 
 // This trait points to the last expression (logical operator) where an eager
 // assumption introduced a state split (i.e. both cases were feasible). This is
-// used by the WeakLoopAssumption heuristic to find situations where the an
-// eager assumption introduces a state split within the evaluation of a loop
-// condition.
+// used by the WeakLoopAssumption heuristic to find situations where an eager
+// assumption introduces a state split in the evaluation of a loop condition.
 REGISTER_TRAIT_WITH_PROGRAMSTATE(LastEagerlyAssumeAssumptionAt, const Expr *)
 
 //===----------------------------------------------------------------------===//
@@ -2838,8 +2837,12 @@ void ExprEngine::processBranch(
     const Expr *EagerlyAssumeExpr =
         PrevState->get<LastEagerlyAssumeAssumptionAt>();
     const Expr *ConditionExpr = dyn_cast<Expr>(Condition);
-    if (ConditionExpr)
+    if (ConditionExpr) {
+      // Ignore casts to ensure equivalent behavior with and without
+      // eagerly-assume. This is a mostly theoretical question an I don't see a
+      // good reason for putting casts around a conditional expression.
       ConditionExpr = ConditionExpr->IgnoreParenCasts();
+    }
     bool DidEagerlyAssume = EagerlyAssumeExpr == ConditionExpr;
     bool BothFeasible = (DidEagerlyAssume || (StTrue && StFalse)) &&
                         builder.isFeasible(true) && builder.isFeasible(false);
@@ -2852,9 +2855,11 @@ void ExprEngine::processBranch(
           // When programmers write a loop, they imply that at least two
           // iterations are possible (otherwise they would just write an `if`),
           // but the third iteration is not implied: there are situations where
-          // the programmer knows that there won't be a third iteration (e.g.
-          // they iterate over a structure that has <= 2 elements) but this is
-          // not marked in the source code.
+          // the programmer knows that there won't be a third iteration, but
+          // this is not marked in the source code. (For example, the ffmpeg
+          // project has 2-element arrays which are accessed from loops where
+          // the number of steps is opaque and the analyzer cannot deduce that
+          // there are <= 2 iterations.)
           // Checkers may use this heuristic mark to discard results found on
           // branches that contain this "weak" assumption.
           StTrue = recordWeakLoopAssumption(StTrue);
diff --git a/clang/test/Analysis/out-of-bounds.c b/clang/test/Analysis/out-of-bounds.c
index 6380e72543bb0c..1e26e7775719ce 100644
--- a/clang/test/Analysis/out-of-bounds.c
+++ b/clang/test/Analysis/out-of-bounds.c
@@ -210,10 +210,10 @@ int loop_suppress_after_zero_iterations(unsigned len) {
   // Previously this would have produced an overflow warning because splitting
   // the state on the loop condition introduced an execution path where the
   // analyzer thinks that len == 0.
-  // There are very many situations where the programmer knows that an argument
-  // is positive, but this is not indicated in the source code, so we must
-  // avoid reporting errors (especially out of bounds errors) on these
-  // branches, because otherwise we'd get prohibitively many false positives.
+  // There are many situations where the programmer knows that an argument is
+  // positive, but this is not indicated in the source code, so we must avoid
+  // reporting errors (especially out of bounds errors) on these branches,
+  // because otherwise we'd get prohibitively many false positives.
   return GlobalArray[len - 1]; // no-warning
 }
 
@@ -231,7 +231,8 @@ void loop_suppress_in_third_iteration(int len) {
   for (int i = 0; i < len; i++) {
     // We should suppress array bounds errors on the third and later iterations
     // of loops, because sometimes programmers write a loop in sitiuations
-    // where they know that there will be at most two iterations.
+    // where they know that there will be at most two iterations, but the
+    // analyzer cannot deduce this property.
     buf[i] = 1; // no-warning
   }
 }
@@ -263,7 +264,10 @@ void loop_suppress_in_third_iteration_logical_and(int len, int flag) {
 void loop_suppress_in_third_iteration_logical_and_2(int len, int flag) {
   int buf[2] = {0};
   for (int i = 0; flag && i < len; i++) {
-    // If the two operands of '&&' are flipped, the suppression works.
+    // If the two operands of '&&' are flipped, the suppression works, because
+    // then 'flag' is the terminator statement associated with '&&' (which
+    // determines whether the short-circuiting happens or not) and 'i < len' is
+    // the terminator statement of the loop itself.
     buf[i] = 1; // no-warning
   }
 }

>From cdc73ab596981f24895374324600e9ec1cff60a8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Mon, 30 Sep 2024 18:02:56 +0200
Subject: [PATCH 4/6] Simplify logic in ExprEngine::processBranch

---
 clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | 80 +++++++++-----------
 1 file changed, 36 insertions(+), 44 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index 1a83894f134095..70625c499414cc 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -2820,18 +2820,20 @@ void ExprEngine::processBranch(
     ProgramStateRef PrevState = PredN->getState();
 
     ProgramStateRef StTrue, StFalse;
-    if (const auto KnownCondValueAssumption = assumeCondition(Condition, PredN))
+    StTrue = StFalse = PrevState;
+
+    if (const auto KnownCondValueAssumption = assumeCondition(Condition, PredN)) {
       std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
-    else {
-      assert(!isa<ObjCForCollectionStmt>(Condition));
-      // TODO: instead of this shortcut perhaps it would be better to "rejoin"
-      // the common execution path with
-      // StTrue = StFalse = PrevState;
-      builder.generateNode(PrevState, true, PredN);
-      builder.generateNode(PrevState, false, PredN);
-      continue;
+
+      if (!StTrue)
+        builder.markInfeasible(true);
+
+      if (!StFalse)
+        builder.markInfeasible(false);
     }
-    if (StTrue && StFalse)
+    bool BothFeasible = builder.isFeasible(true) && builder.isFeasible(false);
+
+    if (BothFeasible)
       assert(!isa<ObjCForCollectionStmt>(Condition));
 
     const Expr *EagerlyAssumeExpr =
@@ -2844,49 +2846,39 @@ void ExprEngine::processBranch(
       ConditionExpr = ConditionExpr->IgnoreParenCasts();
     }
     bool DidEagerlyAssume = EagerlyAssumeExpr == ConditionExpr;
-    bool BothFeasible = (DidEagerlyAssume || (StTrue && StFalse)) &&
-                        builder.isFeasible(true) && builder.isFeasible(false);
 
     // Process the true branch.
     if (builder.isFeasible(true)) {
-      if (StTrue) {
-        if (BothFeasible && IterationsFinishedInLoop &&
-            *IterationsFinishedInLoop >= 2) {
-          // When programmers write a loop, they imply that at least two
-          // iterations are possible (otherwise they would just write an `if`),
-          // but the third iteration is not implied: there are situations where
-          // the programmer knows that there won't be a third iteration, but
-          // this is not marked in the source code. (For example, the ffmpeg
-          // project has 2-element arrays which are accessed from loops where
-          // the number of steps is opaque and the analyzer cannot deduce that
-          // there are <= 2 iterations.)
-          // Checkers may use this heuristic mark to discard results found on
-          // branches that contain this "weak" assumption.
-          StTrue = recordWeakLoopAssumption(StTrue);
-        }
-        builder.generateNode(StTrue, true, PredN);
-      } else {
-        builder.markInfeasible(true);
+      if ((BothFeasible || DidEagerlyAssume) && IterationsFinishedInLoop &&
+          *IterationsFinishedInLoop >= 2) {
+        // When programmers write a loop, they imply that at least two
+        // iterations are possible (otherwise they would just write an `if`),
+        // but the third iteration is not implied: there are situations where
+        // the programmer knows that there won't be a third iteration, but
+        // this is not marked in the source code. (For example, the ffmpeg
+        // project has 2-element arrays which are accessed from loops where
+        // the number of steps is opaque and the analyzer cannot deduce that
+        // there are <= 2 iterations.)
+        // Checkers may use this heuristic mark to discard results found on
+        // branches that contain this "weak" assumption.
+        StTrue = recordWeakLoopAssumption(StTrue);
       }
+      builder.generateNode(StTrue, true, PredN);
     }
 
     // Process the false branch.
     if (builder.isFeasible(false)) {
-      if (StFalse) {
-        if (BothFeasible && IterationsFinishedInLoop &&
-            *IterationsFinishedInLoop == 0) {
-          // There are many situations where the programmers know that there
-          // will be at least one iteration in a loop (e.g. a structure is not
-          // empty) but the analyzer cannot deduce this and reports false
-          // positives after skipping the loop.
-          // Checkers may use this heuristic mark to discard results found on
-          // branches that contain this "weak" assumption.
-          StFalse = recordWeakLoopAssumption(StFalse);
-        }
-        builder.generateNode(StFalse, false, PredN);
-      } else {
-        builder.markInfeasible(false);
+      if ((BothFeasible || DidEagerlyAssume) && IterationsFinishedInLoop &&
+          *IterationsFinishedInLoop == 0) {
+        // There are many situations where the programmers know that there
+        // will be at least one iteration in a loop (e.g. a structure is not
+        // empty) but the analyzer cannot deduce this and reports false
+        // positives after skipping the loop.
+        // Checkers may use this heuristic mark to discard results found on
+        // branches that contain this "weak" assumption.
+        StFalse = recordWeakLoopAssumption(StFalse);
       }
+      builder.generateNode(StFalse, false, PredN);
     }
   }
   currBldrCtx = nullptr;

>From e7e6ded9e4d5116426e79659037d68bf20e70f1a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Tue, 1 Oct 2024 10:31:14 +0200
Subject: [PATCH 5/6] Satisfy git-clang-format

---
 clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index 70625c499414cc..fd0a7dc46f8807 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -2822,7 +2822,8 @@ void ExprEngine::processBranch(
     ProgramStateRef StTrue, StFalse;
     StTrue = StFalse = PrevState;
 
-    if (const auto KnownCondValueAssumption = assumeCondition(Condition, PredN)) {
+    if (const auto KnownCondValueAssumption =
+            assumeCondition(Condition, PredN)) {
       std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
 
       if (!StTrue)

>From cbb46e58e819c864e4e24f5764d8464528d0b806 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Tue, 1 Oct 2024 15:13:56 +0200
Subject: [PATCH 6/6] Add 'no suppression when no assumption' testcases

---
 clang/test/Analysis/out-of-bounds.c | 31 +++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/clang/test/Analysis/out-of-bounds.c b/clang/test/Analysis/out-of-bounds.c
index 1e26e7775719ce..a539107456628d 100644
--- a/clang/test/Analysis/out-of-bounds.c
+++ b/clang/test/Analysis/out-of-bounds.c
@@ -237,6 +237,37 @@ void loop_suppress_in_third_iteration(int len) {
   }
 }
 
+int no_suppression_when_no_assumption_zero_iterations(unsigned len) {
+  if (len != 0) {
+    // This 'if' introduces a state split between len == 0 and len != 0.
+  }
+
+  // On the branch where we assumed that len is zero, this loop will be
+  // skipped. We (intentionally) don't suppress this execution path becasue
+  // here the analyzer doesn't assume anything new when it evaluates the loop
+  // condition.
+  for (unsigned i = 0; i < len; i++)
+    if (GlobalArray[i] > 0)
+      return GlobalArray[i];
+
+  return GlobalArray[len - 1]; // expected-warning{{Out of bound access to memory}}
+}
+
+void no_suppression_when_no_assumption_third_iteration(int len) {
+  if (len < 20) {
+    // This 'if' introduces a state split with len >= 20 on one branch.
+  }
+
+  int buf[2] = {0};
+  for (int i = 0; i < len; i++) {
+    // As in no_suppression_when_no_assumption_zero_iterations, the suppression
+    // only activates when the analyzer assumes something new in the loop
+    // condition. On the branch where `len >= 20` entering the third iteration
+    // doesn't involve a new assumption, so this warning is not suppressed:
+    buf[i] = 1; // expected-warning{{Out of bound access to memory}}
+  }
+}
+
 void loop_suppress_in_third_iteration_cast(int len) {
   int buf[2] = {0};
   for (int i = 0; (unsigned)(i < len); i++) {



More information about the cfe-commits mailing list