[PATCH] D121797: [clang][dataflow] Add modeling of Chromium's CHECK functionality

Fri Mar 18 09:46:37 PDT 2022

xazax.hun added inline comments.

================
Comment at: clang/unittests/Analysis/FlowSensitive/ChromiumCheckModelTest.cpp:122
+  void transfer(const Stmt *S, NoopLattice &, Environment &Env) {
+    M.transfer(S, Env);
+  }
----------------
ymandel wrote:
> xazax.hun wrote:
> > ymandel wrote:
> > > xazax.hun wrote:
> > > > I wonder whether the models should actually be called by the framework at some point. 
> > > > E.g. imagine the following scenario:
> > > > ```
> > > > void f()
> > > > {
> > > >     std::optional<int> o(5);
> > > >     if (o)
> > > >     {
> > > >         // dead code here;
> > > >     }
> > > > }
> > > > ```
> > > > 
> > > > In an ideal case, an analysis could use the `std::optional` modeling to realize that the code in the `if` statement is dead and use this fact to improve its precision. Explicitly request the modeling in the transfer function works OK when we only have a couple things to model. But it might not scale in the future. When we model dozens of standard types and functions we would not want all the analysis clients to invoke all the transfers for all the models individually. 
> > > Agreed. It seems similar the problems that motivated DLLs back in the day. there's clearly a lot to be worked out here in terms of how best to support composition. It's probably worth a RFC or somesuch to discuss in more depth.
> > Having an RFC and some deeper discussions would be great. I also wonder whether modeling arbitrary `Stmt`s is the right approach. The peculiarities of the language should probably be modelled by the framework itself without any extensions. Maybe we only want the modeling of certain function calls to be customizable?
> Good question. It would be really nice if we could draw this line, but I have a bad feeling that it won't be so simple. :) Still, worth looking at our existing models, and new ones that we're developing, to see if we can find a clear "bounding box".
> 
> What does CSA do in this regard?
In the CSA, there is no clear distinction between modeling and diagnostics, each check can do both. Historically, this turned out to be a bad idea. When we have a check that is doing both modeling and diagnostics, users can end up turning the check off due to some false positives and end up getting worse results from other checks because some critical piece of modeling is also turned off. (E.g., even if there are a couple of false positives from a std::optional check, it might still be beneficial to do the modeling in the background without the diagnostics because it might help discover infeasible paths and that can improve the precision of other checks). 

Nowadays, we try to make the distinction clear, some checks are modeling only and others are diagnostic only (they might still have their own modeling but they do not affect the global analysis state, i.e., the diagnostic only checks should not be able to affect each other). This distinction is currently a best effort approach and not enforced by any of the APIs.

In CSA, the checker APIs are very powerful. E.g., if there is a pointer with an unknown value and we see a dereference, we can continue the analysis with the assumption that the pointer is not-null. These assumptions could be added by the framework itself or just a regular check. Over time, we are trying to move as much modeling to (non-diagnostic) checks as possible to keep the framework lightweight but most of the meat is still in the framework.

To model libraries, we are using the evalCall callback: https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/Checkers/CheckerDocumentation.cpp#L229

Roughly speaking the model looks like this:
* The analyzer encounters a function call, so it asks all the checks in a sequence if any of them wants to model it
* A check gets the evalCall callback and can do whatever it wants to do. Most of them will return false most of the time as they are only expected to handle a small subset of the functions.
* This first check returning true will short-circuit this process.
* If none of the checks returned true, the framework will fall back to a default modeling which is a conservative approximation of the call, i.e., invalidating the bindings that could be changes by the function (globals, output arguments, return values etc.)

The model above assumes that when a check return true it will end up modeling all the aspects of a function (like invalidation). A downside might be that it will not solve the composition, when multiple checks want to model the same function, well, the framework will just pick one of them. Also, modeling types is really challenging. E.g., if a modeling check models the constructor of a type but does not model the destructor, the framework will end up using the default modeling for the dtor. The problem is that, in that case the ctor was not modeled by the framework (but the modeling checker) so some invariants were not established. This can result in false positives or even crashes. Overall, it looks like modeling types is almost an all or nothing endeavor.  If a check models at least one method of a type it is really likely that it will need to model at least a bunch more to ensure a good experience.

At Microsoft, we have a similar framework to CSA called EspXEngine. In EspXEngine, we have different APIs for modeling and diagnostic so the API enforce that diagnostic checks cannot influence each other. Our model for the modeling checks is very similar to CSA but the idea is that the modeling checks are maintained by the authors of EspXEngine (while diagnostic checks could be written by anyone), so conflicting models are less of a problem.

Both EspXEngine and CSA has problems chaining modeling checks. E.g., if CSA has modeling for `unique_ptr` and `optional`, `optional<unique_ptr<int>>` or `unique_ptr<optional<int>>` will not model every aspects of the inner type out of the box. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D121797/new/

https://reviews.llvm.org/D121797