[cfe-dev] [analyzer] Why use `checkPostCall()` to model the function semantics in `StdLibraryFunctionsChecker.cpp` ?

Wed Apr 25 12:09:21 PDT 2018

StdLibraryFunctionsChecker uses evalCall for most calls it models. It 
uses checkPostCall only for few functions for which it models so little 
about the function that it's unlikely to ever be a problem for other 
checkers that may eventually want to model the function exactly. The 
checker's original intent was to cut away infeasible paths in the 
program, eg. preventing analysis of paths on which getline() is assumed 
to return -2. It is indeed a problem that other checkers are not able to 
reliably access this information immediately in their own checkPostCall, 
but currently there are no checkers that are actively relying on that. 
There are also discussions about introducing a system of dependencies 
between checkers so that dependent checkers automatically turned on 
checkers on which they depend and have their callbacks fire in a 
specific order, which could probably be already hacked up by writing 
weird registerChecker() functions that register dependencies first.

There are currently at least 4 different ways the analyzer can model a 
function:

1. Conservative evaluation (normal analyzer behavior when body of the 
function is unavailable).
2. Inlining (model the function precisely by jumping into it and 
proceeding with normal analysis inside it).
(1. and 2. will be collectively referred to as "default evaluation".)
3. Body farm (provide a simplified synthetic AST for the function body 
and then inline it).
4. evalCall() in checker (let a checker manipulate the program state 
manually to model arbitrary effects of the function).

Additionally, any checker may influence the analysis at almost any 
point, which should be used carefully. For instance, splitting the path 
or cutting away a path that seems infeasible is fine (as long as it is 
the desired behavior), replacing a value of an expression with a 
different value is bad.

When body farms were introduced, they seemed to be a great way of 
modeling library functions, and they are fairly effective for the few 
functions they were used for. But later a lot of functions turned out to 
be problematic to model that way - either because their simplified AST 
is too complicated to synthesize correctly (eg. std::call_once turned 
out to be extremely painful because we had to write down AST for 
template instantiations manually node-by-node without being able to rely 
on the compiler to help us with that) or because a good synthetic AST 
will not be understood by the analyzer anyway. 
StdLibraryFunctionsChecker is modeling some functions that fall to the 
latter category. You should be able to find further explanation of why 
they are hard to body-farm in the checker's comments.

The difference between evalCall and checkPostCall is that evalCall 
overrides the default evaluation. If a checker does evalCall(), the 
function will never be inlined or invalidate potentially accessible 
memory. The checker will also need to come up with a good representation 
of the return value and will have a chance to specify it. If two 
different checkers try to evalCall() the same call, the analyzer will 
defensively crash.

StdLibraryFunctionsChecker uses evalCall for modeling calls that it can 
model *exactly*.

It also uses checkPostCall for stuff that it can't model exactly, but 
for the lack of better modeling it can still model a few things that are 
safe to model in post-call, in addition to the effects of default 
evaluation. For example, it doesn't model the string (or even the length 
of the string) produced by getline() but it does know that this function 
never returns -2, so it cuts away the respective paths. If getline() is 
inlined or a different checker models it in evalCall or even in 
checkPostCall, StdLibraryFunctionsChecker will still work correctly, 
because, well, whatever the other modeling does, it shouldn't make 
getline() return -2. It might happen that another checker substitutes 
the return value in PostCall leading to a race, but that's the exact 
reason why substituting expression values after the expression is 
evaluated is a bad idea anywhere in the analyzer.

StdLibraryFunctionsChecker uses a custom system of function summaries 
which is relatively extensible but not super flexible. It should 
probably not used for modeling everything. In fact, i doubt it'd be easy 
to extend it to reliably model anything but range constraints. Side 
effects like "this function writes its 1st argument to memory pointed to 
by its 2nd argument" are already pretty unpleasant to summarize 
declaratively; add a couple of levels of pointer indirection and it'd be 
a nightmare.

So, adding more functions and side effect variants similar to what's 
already there is welcome. I'm moderately curious about how far this 
summary system can be pushed, but reliability comes first. Trying to 
model every function this way is not a great idea.

On 4/24/18 8:46 PM, Henry Wong via cfe-dev wrote:
> Hi all,
>
> `StdLibraryFunctionsChecker.cpp` is a very useful and great tool to 
> improve the modeling of library function. But I can't figure out why 
> use `checkPostCall()` to model the function samantic.
>
> What puzzles me is the order of API calls. For example, if we want to 
> make some checks on `getline()` in `CheckerA`, and use 
> `checkPostCall()` to collect information or set `ProgramState`, the 
> `checkPostCall()` of `CheckerA` is likely to be behind the 
> `checkPostCall()` of `StdLibraryFunctionsChecker.cpp`. At this point, 
> `CheckerA` does not use the model information of `getline()` in 
> `StdLibraryFunctionsChecker.cpp`. So what is the original intention of 
> using `checkPostCall()` to play the key role in modeling?
>
> And I want to know what plans community have for 
> `StdLibraryFunctionsChecker.cpp` in the future, for example, extend it 
> to handle more complex library functions?
>
> Thanks in advance!
>
> Henry Wong
> Qihoo 360 Codesafe Team
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180425/c3f16e1d/attachment.html>