[PATCH] D54149: [Analyzer] [WIP] Standard C++ library functions checker for the std::find() family

Fri Nov 30 15:32:22 PST 2018

NoQ added a comment.
Herald added a subscriber: gamesh411.

Such checker is useful. It'd be great to add a lot of "pure" functions into it, simply for the sake of pointing out that they have no side effects, i.e. avoid unnecessary invalidation.

The bound-to-argument constraint is useful. It avoids the alpha-renaming problem by assigning the correct value right away instead of stuffing an aliasing condition into the solver.

I still believe that the idea to split the state on `std::find` is questionable, no matter how exactly it's implemented. `std::find` does more than tells whether the element is in the container, so calling it just for its other effect (to obtain the iterator to the element) when you already know that the element is in the container is not an indication of a dead code, even if it is a bad code smell in most cases. I think it is very likely that people will come to us with false positives of that kind and we'll have to revert this behavior. Would you be OK to keep support for `std::find()` under an option, off by default but available when the user opts into lint checks? This could be implemented by making a new check kind (eg., `CK_OptimisticStdCXXLibraryFunctionsChecker`).

Random thought: i still think that implementing the "object value id" thing (a symbol-like id that is associated with the memory region of the object and gets invalidated when the object is logically mutated but gets copied as-is to the new object upon copy/move-constructors or assignments) is a good way to refactor all these things. But this work seems orthogonal to what you did here (i.e., we'll have to perform the copy of the "id" here, but that's just a separate task). So this patch shouldn't be blocked on that.

Another random thought: it'd be great to make a bug reporter visitor that highlights state splits produced by this checker. Eg., in your case it would produce an event note that says something like "Assuming object is not present in iterator range". Otherwise the user would be unable to understand why do you think that the iterator is bad. Or for, say, `ispunct()` it would say "Assuming character is punctuation symbol".

================
Comment at: lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:159-161
+    ProgramStateRef
+    applyAsBoundToArgument(ProgramStateRef State, const CallEvent &Call,
+                           const FunctionSummaryTy &Summary) const;
----------------
I think it's important to document that bound-to-argument should always be used instead of compares-to-argument when the comparison operator is == and the constraint is applied to return values of fully evaluated functions.

================
Comment at: lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:375-376
+  OtherV = SVB.evalCast(OtherV, T, OtherT);
+  State = State->BindExpr(Call.getOriginExpr(), Call.getLocationContext(),
+                          OtherV);
+  return State;
----------------
Hmmmmmmmm.

`std::find` is a function that returns a C++ object by value.

For such functions it is not enough to bind the returned prvalue of the object. It is important to realize that while it is an equal iterator, it's not the same object. Which is why it is necessary to foresee the storage for the newly constructed iterator, so that destruction/materialization/copy elision worked correctly. See the respective logic in `ExprEngine::bindReturnValue()`:

```
  lang=c++
   611   if (auto RTC = getCurrentCFGElement().getAs<CFGCXXRecordTypedCall>()) {
   612     // Conjure a temporary if the function returns an object by value.
   613     SVal Target;
   614     assert(RTC->getStmt() == Call.getOriginExpr());
   615     EvalCallOptions CallOpts; // FIXME: We won't really need those.
   616     std::tie(State, Target) =
   617         prepareForObjectConstruction(Call.getOriginExpr(), State, LCtx,
   618                                      RTC->getConstructionContext(), CallOpts);
   // ...
   627   } // ...
```

It's bad that checkers currently need to do that manually when they implement `evalCall()` for C++ functions that return objects. We need to come up with a better API, eg. `State->BindReturnValue(same arguments)` that prepares for object construction (i.e., move the part of the functionality of `ExprEngine::bindReturnValue()` that's responsible for binding into a `ProgramState` method, and only keep the functionality that's responsible for conjuring the value).

================
Comment at: test/Analysis/std-cxx-library-functions.cpp:7-11
+void test_find(std::vector<int> V, int n) {
+  const std::vector<int>::iterator b = V.begin(), e = V.end();
+  clang_analyzer_eval(std::find(b, e, n) == e); // expected-warning{{TRUE}}
+  // expected-warning at -1{{FALSE}}
+}
----------------
I think that's a great standalone test.

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D54149/new/

https://reviews.llvm.org/D54149