[clang] [analyzer] Suppress out of bounds reports after weak loop assumptions (PR #109804)

Thu Oct 10 06:26:26 PDT 2024

=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>,
=?utf-8?q?Donát?= Nagy <donat.nagy at ericsson.com>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/109804 at github.com>


================
@@ -212,6 +212,25 @@ typedef llvm::ImmutableMap<const LocationContext *, unsigned>
 REGISTER_TRAIT_WITH_PROGRAMSTATE(PendingArrayDestruction,
                                  PendingArrayDestructionMap)
 
+// This trait is used to heuristically filter out results produced from
+// execution paths that took "weak" assumptions within a loop.
+REGISTER_TRAIT_WITH_PROGRAMSTATE(SeenWeakLoopAssumption, bool)
+
+ProgramStateRef clang::ento::recordWeakLoopAssumption(ProgramStateRef State) {
+  return State->set<SeenWeakLoopAssumption>(true);
+}
+
+bool clang::ento::seenWeakLoopAssumption(ProgramStateRef State) {
+  return State->get<SeenWeakLoopAssumption>();
+}
----------------
isuckatcs wrote:

Yes, but with this left turned on, the coverage drop is huge even in cases that are not affected by the assumption.

```c++
void foo(int x, int y) {
  for (unsigned i = 0; i < x; i++) ; // split the state and set SeenWeakLoopAssumption to 'true'
  if (x != 0) return;                // drop the 'true' branch

  // no warnings are reported from this point on

  int buf[1] = {0};
  for (int i = 0; i < y; i++)
    buf[i] = 1;                      // SeenWeakLoopAssumption is 'true', so the warning is suppressed
}
```
This goes on through multiple function calls too.
```c++
void a() {}

void b() { a(); }

void c() { b(); }

void d() { c(); }

void main() {
  for (unsigned i = 0; i < x; i++)  ;
  if (x != 0) return;

  // no warnings are reported from this point on

  d();
}
```
If a warning is found inside any of `a()`, `b()`, `c()` or `d()`, it is suppressed because the trait is set on the top of the execution path. 

Since we generate a sink it is just 1 false negative though, while relying on an unfounded assumption might trigger a false positive in one of the nested functions, so I guess we can live with this.

https://github.com/llvm/llvm-project/pull/109804