[PATCH] D44347: [analyzer] symbol_iterator must iterate through the symbolic base.

Artem Dergachev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Mar 9 19:05:48 PST 2018

NoQ created this revision.
NoQ added reviewers: dcoughlin, xazax.hun, a.sidorin, george.karpenkov, szepet.
Herald added subscribers: cfe-commits, rnkovacs, baloghadamsoftware.

I've been investigating a false positive that had a pointer-type symbol with a non-zero range which had its range forgotten and later assumed to be zero. The big test that i added (`test_region_referenced_only_through_field_in_store_value()`) roughly re-creates the infeasible path that was taken. A bit simplified, it looks like this:

   1 int *p;
   3 struct S {
   4   int field;
   5 };
   7 struct S *conjure();
   9 void foo() {
  10   struct S *s = conjure();
  11   p = &s->field;
  12   if (!s) {
  13     exit(0);
  14   }
  15   if (!p) {
  16     clang_analyzer_warnIfReached(); // no-warning
  17   }
  18 }

It is easy to see that the path with `clang_analyzer_warnIfReached()` is infeasible. The bug that causes us to explore that infeasible path is in the `RegionStore`'s dead symbol removal algorithm that didn't realize that the symbol is live. Recall that `RegionStore` scans itself with the help of a worklist algorithm - it enforces the invariant of //"if a binding key is live, then the respective binding value is also live"// (and a few other similar invariants) and marks symbols and regions as live until the live set reaches a fixed point (i.e. the worklist runs dry).

In our example the symbol in question is the return value of `conjure_S1()`. We'd still refer to it as `$x`.

1. `$x` is present in the Store as a binding value of form `&SymRegion{$x}->field` (which is equivalent to `$x + offsetof(field)`, i.e. it's possible to easily recover `$x` from it).
2. Symbol `$x` is not mentioned anywhere else in the program state.
3. Additionally, there are no store bindings anywhere within region `SymRegion{$x}`.
4. When `&SymRegion{$x}->field` is found as a binding value in `removeDeadBindingsWorker::VisitBinding()`, symbolic base `SymRegion{$x}` of region `SymRegion{$x}->field` is put into the worklist (`removeDeadBindingsWorker::AddToWorkList()` descends to the base region automatically).
5. However, `$x` is not added to the live set yet, because the loop through sub-symbols (from `symbol_begin()` to `symbol_end()`) doesn't descend to from `SymRegion{$x}->field` to its symbolic base `SymRegion{$x}` in order to find `$x`.
6. In most cases `$x` will be found anyway when later processing the worklist. Indeed, `removeDeadBindingsWorker::VisitCluster()` would take `SymRegion{$x}` from the worklist and immediately find `$x` within it. This is actually one of the "other similar invariants": //"if a binding key is live and is a symbolic region, its parent symbol is live"//.
7. In our case, however, this will not happen, because `SymRegion{$x}` has no bindings and therefore its cluster is empty and `removeDeadBindingsWorker::VisitCluster()` will bail out early.

Essentially, it worked most of the time because most regions have bindings. But normally it is not a responsibility of `removeDeadBindingsWorker::VisitCluster()` to find symbol `$x` within binding value `SymRegion{$x}->field` - it should definitely not try to care about the liveness of stuff in empty clusters, those should only be live if there are other reasons to believe they're alive. Being a binding value is one such "other reason", which means that on step 5 we should have added `$x` to the live set.

I believe that this is exactly how step 5 was intended to work. However, the undocumented behavior of `symbol_begin()` that consists in not descending to the base region was unexpected.


In order to confirm my understanding, i tried to see what other users of `symbol_begin()` expect. It turns out that three other users made the same mistake, and in two of these three cases it led to keeping the symbol around longer (instead shorter) than necessary (i.e. zombie symbols), while the third case is similar to the original example albeit more exotic. In all other cases users of `symbol_begin()` did not care if it descends to the base region or not.


- `CStringChecker::checkLiveSymbols()` iterates over sub-symbols of string length to mark string lengths as live. Because string length is for now only composed of `CStringChecker`-tagged metadata symbols and constants, and therefore it is a `NonLoc` that isn't a `nonloc::LocAsInteger`, it is irrelevant whether `symbol_begin` descends to the symbolic base.
- `Environment::removeDeadBindings()` marks symbols within a value of a dead expression as `maybeDead()`. I added a test case, `test_region_referenced_only_through_field_in_environment_value()`, to demonstrate that we indeed produce a zombie symbol due to `symbol_begin()` here not descending to the symbolic base.
- `ScanReachableSymbols::scan(Sym)` scans sub-symbols of a raw symbol, not of an `SVal`. Therefore it is not a region and there is no need to descend to the symbolic base.
- `ProgramState::isTainted(Sym, Kind)` scans a raw symbol as well.
- `removeDeadBindingsWorker::VisitBinding()` the test case we've just discussed.
- `RegionStoreManager::removeDeadBindings` also has a bug similar to the one in `Environment::removeDeadBindings()`, which is demonstrated by test case `test_zombie_referenced_only_through_field_in_store_value()`.
- `SymExpr::computeComplexity()` scans a raw symbol.
- `SymbolReaper::markElementIndicesLive()` scans an array symbol, which is always a `NonLoc`. However, it may still be a `nonloc::LocAsInteger`, and in this case descend to the symbolic base would be necessary, as demonstrated by `test_loc_as_integer_element_index_lifetime()`. This test, however, is not fixed by the current patch, due to `getAsLocSymbol()` incorrectly failing to pass its `IncludeBaseRegions` argument into the recursive call. But i decided not to investigate it further today, because, well, //enough is enough//.

  rC Clang



Index: test/Analysis/symbol-reaper.c
--- test/Analysis/symbol-reaper.c
+++ test/Analysis/symbol-reaper.c
@@ -3,6 +3,9 @@
 void clang_analyzer_eval(int);
 void clang_analyzer_warnOnDeadSymbol(int);
 void clang_analyzer_numTimesReached();
+void clang_analyzer_warnIfReached();
+void exit(int);
 int conjure_index();
@@ -58,6 +61,12 @@
 struct S2 {
   struct S1 array[5];
 } s2;
+struct S3 {
+  void *field;
+struct S1 *conjure_S1();
+struct S3 *conjure_S3();
 void test_element_index_lifetime_with_complicated_hierarchy_of_regions() {
   do {
@@ -68,6 +77,18 @@
   } while (0); // no-warning
+void test_loc_as_integer_element_index_lifetime() {
+  do {
+    int x;
+    struct S3 *s = conjure_S3();
+    clang_analyzer_warnOnDeadSymbol((int)s);
+    x = (int)&(s->field);
+    ptr = &arr[x];
+    if (!s) {}
+  // FIXME: Should not warn. The symbol is still alive within the ptr's index.
+  } while (0); // expected-warning{{SYMBOL DEAD}}
 // Test below checks lifetime of SymbolRegionValue in certain conditions.
 int **ptrptr;
@@ -78,3 +99,38 @@
   (void)0; // No-op; make sure the environment forgets things and the GC runs.
   clang_analyzer_eval(**ptrptr); // expected-warning{{TRUE}}
 } // no-warning
+int *produce_region_referenced_only_through_field_in_environment_value() {
+  struct S1 *s = conjure_S1();
+  clang_analyzer_warnOnDeadSymbol((int) s);
+  int *x = &s->field;
+  return x;
+void test_region_referenced_only_through_field_in_environment_value() {
+  produce_region_referenced_only_through_field_in_environment_value();
+} // expected-warning{{SYMBOL DEAD}}
+void test_region_referenced_only_through_field_in_store_value() {
+  struct S1 *s = conjure_S1();
+  clang_analyzer_warnOnDeadSymbol((int) s);
+  ptr = &s->field; // Write the symbol into a global. It should live forever.
+  if (!s) {
+    exit(0); // no-warning (symbol should not die here)
+    // exit() is noreturn.
+    clang_analyzer_warnIfReached(); // no-warning
+  }
+  if (!ptr) { // no-warning (symbol should not die here)
+    // We exit()ed under these constraints earlier.
+    clang_analyzer_warnIfReached(); // no-warning
+  }
+  // The exit() call invalidates globals. The symbol will die here because
+  // the exit() statement itself is already over and there's no better statement
+  // to put the diagnostic on.
+} // expected-warning{{SYMBOL DEAD}}
+void test_zombie_referenced_only_through_field_in_store_value() {
+  struct S1 *s = conjure_S1();
+  clang_analyzer_warnOnDeadSymbol((int) s);
+  int *x = &s->field;
+} // expected-warning{{SYMBOL DEAD}}
Index: include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
--- include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
+++ include/clang/StaticAnalyzer/Core/PathSensitive/SVals.h
@@ -183,7 +183,7 @@
   void dump() const;
   SymExpr::symbol_iterator symbol_begin() const {
-    const SymExpr *SE = getAsSymbolicExpression();
+    const SymExpr *SE = getAsSymbol(/*IncludeBaseRegions=*/true);
     if (SE)
       return SE->symbol_begin();

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D44347.137891.patch
Type: text/x-patch
Size: 3181 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20180310/48840cb2/attachment.bin>

More information about the cfe-commits mailing list