[PATCH] D25660: [Analyzer] Checker for iterators dereferenced beyond their range.

Thu Nov 10 11:38:04 PST 2016

zaks.anna added inline comments.

================
Comment at: lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp:219
+
+  assert(LCtx->getKind() == LocationContext::StackFrame &&
+         "Function does not begin with stack frame context");
----------------
a.sidorin wrote:
> `isa<StackFrameContext>(LCtx)`?
> And `cast<>` below already does the same check with an assertion.
At least one advantage of the assert is that it provides an error message. I'd not try to minimize the number of asserts.

================
Comment at: lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp:423
+
+void IteratorPastEndChecker::handleComparison(CheckerContext &C,
+                                              const SVal &LVal,
----------------
baloghadamsoftware wrote:
> baloghadamsoftware wrote:
> > NoQ wrote:
> > > baloghadamsoftware wrote:
> > > > NoQ wrote:
> > > > > a.sidorin wrote:
> > > > > > What will happen if we write something like this:
> > > > > > ```
> > > > > > bool Eq1 = it1 == it2;
> > > > > > bool Eq2 = it3 == it4;
> > > > > > if (Eq1) {...}?
> > > > > > ```
> > > > > > 
> > > > > > As I understand, we'll assume the second condition instead of first.
> > > > > Had a look. So the problem is, we obtain the result of the comparison as a symbol, from which it is too hard to recover the operands in order to move iterator position data from one value to another.
> > > > > 
> > > > > Normally we obtain a simple SymbolConjured for the return value of the `operator==()` call (or, similarly, `operator!=()`). For plain-value iterators (eg. `typedef T *iterator`) we might be obtaining an actual binary symbolic expression, but even then it's not quite clear how to obtain operands (the structure of the expression might have changed due to algebraic simplifications). Additionally, LHS and RHS aren't necessarily symbols (they might be semi-concrete), so composing symbolic expressions from them in general is problematic with our symbol hierarchy, which is rarely a problem for numbers but for structural symbols it'd be a mess.
> > > > > 
> > > > > For now i suggest, instead of storing only the last LHS and RHS, to save a map from symbols (which are results of comparisons) to (LHS value, RHS value) pairs. This map should, apart from the obvious, be cleaned up whenever one of the iterators in the pair gets mutated (incremented or decremented). This should take care of the problem Alexey points out, and should work with semi-concrete stuff.
> > > > > 
> > > > > For the future i suggest to let users construct their own symbols and symbolic expressions more easily. In fact, if only we had all iterators as regions, we should have probably used SymbolMetadata for this purpose: it's easy to both recover the parent region from it and use it in symbolic expressions. We could also deprecate the confusing structural symbols (provide default-bound lazy compound values for conjured structures instead), and then it'd be possible to transition to SymbolMetadata entirely.
> > > > Thank you for the suggestion. I am not sure if I fully understand you. If I create a map where the key is the resulting symbol of the comparison, it will not work because evalAssume is called for the innermost comparison. So if the body of operator== (or operator!=) is inlined, then I get a binary symbolic expression in evalAssume, not the SymbolConjured. This binary Symbolic expression is a comparison of the internals of the iterators, e.g. the internal pointer. So the key will not match any LHS and RHS value pair in the map. I also thought on such solution earlier but I dismissed it because of this.
> > > Well, even if the body of the comparison operator is inlined, PreStmt/PostStmt callbacks should still work, and it doesn't really matter if there's a `SymbolConjured` or not, we can still add the symbolic expression to our map as a key.
> > > 
> > > Essentially, you ignore whatever happens in the iterator's operator==() when it's inlined (including any evalAssume events), then in PostStmt of operator==() you map the return-value symbol of the operator to the operator's arguments (operands), then whenever an assumption is being made against the return-value symbol, you carry over this assumption to the operands. I think it shouldn't really matter if the operator call was inlined.
> > > 
> > > The only unexpected thing that may happen due to inlining is if the inlined operator returns a concrete value (plain true or plain false) instead of the symbol, but in this case what we need to do is to just carry over the assumption to the operands instantly.
> > Maybe if I evaluate the operator==() call for iterators using evalCall()?
> Sorry, maybe my phrasing was not accurate enough. The problem is that the assumption is not being made against the return-value symbol of the operator==(), but if inlined, against the internal == operator. So I do not have the same key in evalAssume() thus I cannot access the operands from the map I stored in checkPostCall(). The only solution I can imagine is that I evalCall() the operator==() but then we lose the opportunity to check anything inside the body of the operator.
Thanks for working on this!!!

We've discussed this with Artem and Devin in more detail and here are the notes from the conversation.

Just to summarize, Artem's proposal is to replace the two trates for RHS and LHS with a map from a symbol that represents the result of the iterator comparison to LHS SVal, RHS SVal, and the relation between them (== | !=). 

Are you concerned about this case:

bool operator==(const it&RHS) {
  return x == RHS.x; // If evalAssume is called here, we are just going to ignore it.
} // We get a post call and can fill in the map from binary symbolic expression to LHS and RHS.

You are right, we will get a binary symbolic expression and not SymbolConjured. And we will not fill in the map until the return from the inlined operator. However, even if the operator is inlined, we will be calling PostCall on it after the return. So at that point, the (binary symbolic expression) -> (LHS, RHS, ==) entry will be added to the map, where the LHS and RHS will be the arguments to the call. The evalAssume will be called on the caller side.

Another example:
bool operator==(const it&RHS) {
  if (x == RHS.x)
    return y = RHS.y;
  return false; // <- Constant is returned.
}
In this case, a concrete value is returned on one of the branches. The suggestion is not to rely on evalAssume, but record the relation of the iterators based on the value of the constant being returned. When the expression is evaluated on the caller side, one of the branches will be unreachable anyway, so we will not loose precision here even if we do nothing on evalAssume.

Also, could you please add examples that use the inlined and non-inlined operators in the following way to make sure everything still works:
  if ( ! (i==e) )

Very Important: You should test your patch with `eagerly assume` option turned on since this mode the analyzer is on by default and running without eagerly assume is outdated. An option to run without eagerly assume should be removed altogether.

https://reviews.llvm.org/D25660