[PATCH] D32592: [Analyzer] Iterator Checker - Part 1: Minimal Checker for a Simple Test Case

Fri May 5 08:19:13 PDT 2017

NoQ added a comment.

Noticed a few more things.

It sounds as if once this first patch lands, the rest should be easy :)

Regarding the comments in the code. I materialized my wishes to something like:

- At the top of the file:

  // In the code, iterator can be represented as a:
  // * type-I: typedef-ed pointer. Operations over such iterator, such as comparisons or increments, are modeled straightforwardly by the analyzer.
  // * type-II: structure with its method bodies available.  Operations over such iterator are inlined by the analyzer, and results of modeling these operations are exposing implementation details of the iterators, which is not necessarily helping.
  // * type-III: completely opaque structure. Operations over such iterator are modeled conservatively, producing conjured symbols everywhere.
  //
  // Additionally, depending on the circumstances, operators of types II and III can be represented as:
  // * type-IIa, type-IIIa: conjured structure symbols - when returned by value from conservatively evaluated methods such as `.begin()`.
  // * type-IIb, type-IIIb: memory regions of iterator-typed objects, such as variables or temporaries, when the iterator object is currently treated as an lvalue.
  // * type-IIc, type-IIIc: compound values of iterator-typed objects, when the iterator object is treated as an rvalue taken of a particular lvalue, eg. a copy of "type-a" iterator object, or an iterator that existed before the analysis has started.

Not sure if type-IIa iterators actually make sense. It's ok if you come up with your own classification :)

Then, in methods that deal with iterator `SVal`s directly, i wish we had hints explaining what's going on in these ~7 cases. In my opinion, that'd greatly help people understand the code later, and it'd help us understand how to avoid this variety and provide checker authors with a better API as soon as we get to this, so it's the biggest concern for me about this checker.

================
Comment at: lib/StaticAnalyzer/Checkers/IteratorChecker.cpp:16
+#include "ClangSACheckers.h"
+#include "clang/AST/DeclTemplate.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
----------------
This header seems unused for now.

================
Comment at: lib/StaticAnalyzer/Checkers/IteratorChecker.cpp:22
+
+#include <utility>
+
----------------
This header seems unused for now.

================
Comment at: lib/StaticAnalyzer/Checkers/IteratorChecker.cpp:153-155
+REGISTER_MAP_WITH_PROGRAMSTATE(IteratorSymbolMap, SymbolRef, IteratorPosition)
+REGISTER_MAP_WITH_PROGRAMSTATE(IteratorRegionMap, const MemRegion *,
+                               IteratorPosition)
----------------
Carryover from the other review: did you try using `RegionOrSymbol` as a key and keep only one map?

================
Comment at: lib/StaticAnalyzer/Checkers/IteratorChecker.cpp:166-167
+
+static llvm::APSInt Zero = llvm::APSInt::get(0);
+static llvm::APSInt One = llvm::APSInt::get(1);
+
----------------
I've a bit of doubt about those. Would they call their constructors every time clang starts, regardless of whether the analyzer or the checker is enabled? Maybe having them as private variables inside the checker class would be better?

As in http://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors

================
Comment at: lib/StaticAnalyzer/Checkers/IteratorChecker.cpp:294-297
+    // Assumption: if return value is an iterator which is not yet bound to a
+    //             container, then look for the first iterator argument, and
+    //             bind the return value to the same container. This approach
+    //             works for STL algorithms.
----------------
baloghadamsoftware wrote:
> NoQ wrote:
> > I guess this deserves a test case (we could split this out as a separate feature as well).
> > 
> > I'm also afraid that we can encounter false positives on functions that are not STL algorithms. I suggest doing this by default only for STL functions (or maybe for other specific classes of functions for which we know it works this way) and do this for other functions under a checker option (i.e. something like `-analyzer-config IteratorChecker:AggressiveAssumptions=true`, similar to `MallocChecker`'s "Optimistic" option).
> I will check whether this piece of code could be moved in a later part of the checker. However, I suggest to first wait for the first false positives before we introduce such an option. This far the false positives in my initial tests had different reasons, not this one.
Unfortunately, we've had a poor experience with this approach in other checkers. You never know, and it seems that it's always better to have a safe fallback mode available under a flag, because if a few classes of false positives are found, and we are forced to reduce the checker to a safer behavior, it'd be hard to remember all the places where unsafe heuristics were used.

================
Comment at: lib/StaticAnalyzer/Checkers/IteratorChecker.cpp:478-479
+    auto &SymMgr = C.getSymbolManager();
+    EndSym = SymMgr.conjureSymbol(CE, C.getLocationContext(),
+                                  C.getASTContext().LongTy, C.blockCount());
+    State = createContainerEnd(State, ContReg, EndSym);
----------------
NoQ wrote:
> baloghadamsoftware wrote:
> > NoQ wrote:
> > > I see what you did here! And i suggest considering `SymbolMetadata` here instead of `SymbolConjured`, because it was designed for this purpose: the checker for some reason knows there's a special property of an object he wants to track, the checker doesn't have a ready-made symbolic value to represent this property (because the engine didn't know this property even exists, so it didn't denote its value), so the checker comes up with its own notation. `SymbolMetadata` is used, for example, in `CStringChecker` to denote an unknown string length for a given null-terminated string region.
> > > 
> > > This symbol carries the connection to the region (you may use it to reverse-lookup the region if necessary), and in particular it is considered to be live as long as the base region is live (so you don't have to deal with liveness manually; see also comments around `SymbolReaper::MarkInUse`). It's also easier to debug, because we can track the symbol back to the checker. Generally, symbol class hierarchy is cool because we know a lot about the symbol by just looking at it, and `SymbolConjured` is a sad fallback we use when we didn't care or manage to come up with a better symbol.
> > > 
> > > I'm not sure if `LongTy` is superior to `IntTy` here, since we don't know what to expect from the container anyway.
> > > 
> > > Also, please de-duplicate the symbol creation code. Birth of a symbol is something so rare it deserves a drum roll :)
> > > 
> > > I'd take a pause to figure out if the same logic should be applied to the map from containers to end-iterators.
> > SymbolMetaData is bound to a MemRegion. Iterators are sometimes symbols and sometimes memory regions, this was one of the first lessons I learned from my first iterator checker.
> Oh. Hmm. Ok. Right.
> 
> To be sure: in what cases do you need to create a new symbol when the iterator is already a symbol? How broken do we become if we try to say that the symbolic iterator and its own offset are the same thing?
All right, i guess it wasn't a great idea. Even if iterators are plain pointers, the offset is a pointer difference. I still have this feeling that the analyzer is not good enough for this checker yet, so at least it's great that we have it moving.

I once wished we had "metadata regions" for symbols and regions, and then automatically gaining symbols for these region values (http://lists.llvm.org/pipermail/cfe-dev/2016-May/049000.html), that would have made things a lot cooler maybe. Never mind.

================
Comment at: test/Analysis/iterator-range.cpp:1-2
+// RUN: %clang_analyze_cc1 -std=c++11 -analyzer-checker=core,cplusplus,alpha.cplusplus.IteratorRange -analyzer-eagerly-assume -analyzer-config c++-container-inlining=false %s -verify
+// RUN: %clang_analyze_cc1 -std=c++11 -analyzer-checker=core,cplusplus,alpha.cplusplus.IteratorRange -analyzer-eagerly-assume -analyzer-config c++-container-inlining=true -DINLINE=1 %s -verify
+
----------------
Could we add run-lines without `-analyzer-eagerly-assume`? Currently all variants are passing, but if new tests will fail, we could `#ifndef` them out.

================
Comment at: test/Analysis/iterator-range.cpp:9
+  if (i != v.end())
+    *i; // no-warning
+}
----------------
I'd suggest sticking `clang_analyzer_warnIfReached()` here to demonstrate that there is no warning because the branch is dead. That'd make the test stronger.

https://reviews.llvm.org/D32592