[cfe-dev] [Analyzer][RFC] Function Summaries

Mon Feb 3 08:22:46 PST 2020

Hi,

This year we plan to experiment with function summaries in CSA. We have a
budge roughly for 2 people to work on this.

Short-term plans:
^^^^^^^^^^^^^^^^^
We'd like to be able to read and use hand-written function summaries. A
good example of such hand-written summaries are in Cppcheck
<https://github.com/danmar/cppcheck/blob/master/cfg/std.cfg>. The existing
StdLibraryFunctionsChecker is the entry point for our upcoming changes. I
have created a patch which eliminates the macros in that checker, this way
the used classes like `ValueRange` become extensible [
https://reviews.llvm.org/D73897]. E.g. we can add an argument constraint
that must be true on all branches, if the constraint fails then we warn [
https://reviews.llvm.org/D73898]. (Note that these patches are still WIP.)

What could be in a summary?
- Branches
  * E.g. for f(x), if x is in ['0', '9'] then ret is not 0.
  * This is already handled in StdLibraryFunctionsChecker.
- Argument constraints (a contsraint that must be true on all branches)
  * Range constraint: E.g. x must be in [0, 255] on all branches. This is
the
    case for instance with isalpha(x) where if x is out of the range then
the
    program is ill-formed (undefined behaviour) according to the C standard.
  * Size requirements
    E.g.: asctime_s(char *buf, rsize_t bufsz, const struct tm *time_ptr);
    `buf` size must be at least `bufsz`.
  * Not-null
  * Not-uninitalized
  * Not-tainted (?)
- Return value properties (?)
  * E.g.: the return value must be checked always.
    void foo() {
      void *P = aligned_alloc(4, 8);
      if (P == NULL) { // OK
      }
    }
    void bar() {
      void *P = aligned_alloc(4, 8);
      void *X = P + 1; // expected-warning{{Use of return value that was
not checked}}
    }
    Storing this information in summaries could provide configuration for
checkers like this https://reviews.llvm.org/D72705.
    Also statistics based checkers
<https://llvm.org/devmtg/2019-04/slides/Lightning-Balogh-Statistics_based_checkers_in_the_Clang_Static_Analyzer.pdf>
could also benefit from this information.

Perhaps we should come up with a summary description file format (e.g. json
or
whatever) that could be parsed by the future variant of the existing
StdLibraryFunctionsChecker.

After we have a strong foundation of handling existing summaries then we can
move on to generate summaries automatically. We do believe that
summary-based
analysis could have a strong added value only if we are capable of
automatically synthesizing the summaries.

Mid-term plans:
^^^^^^^^^^^^^^^
Automatically create summaries based on the methods described in SMART and
Microsoft PEX [Godefroid2007, Godefroid2008].
The basic idea is that we create summaries on-demand during the analysis
and then later we try to reuse them if the calling-context allows that.
This is something like an advanced caching of the effects of function calls.
A summary is in the form [pre]f(x)[post], where `pre` is the preconditions
over function inputs and `post` is postconditions over function outputs.
There are many open questions and concerns here. E.g. about performance:
probably we should use an SMT solver to decide whether we can apply a
summary.
Also, the mentioned publications do runtime execution to drive the symbolic
execution towards certain paths.
Naturally we cannot do that, but we hope that the worklist based driving
could have a similar role.

Long-term plans:
^^^^^^^^^^^^^^^^
Create summaries based on the ideas in Facebook Infer's publications
[FBInfer2005, FBInfer2020].
The basic idea here is to create summaries the are free from the
calling-context.
This can be done only with a special data-flow computation and with a
proofing system.
This solution would render the analysis to a two-phase analysis.

References:
^^^^^^^^^^^
[Godefroid2007] Godefroid, Patrice. "Compositional dynamic test
generation." Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on
Principles of programming languages. 2007.
[Godefroid2008] Anand, Saswat, Patrice Godefroid, and Nikolai Tillmann.
"Demand-driven compositional symbolic execution." International Conference
on Tools and Algorithms for the Construction and Analysis of Systems.
Springer, Berlin, Heidelberg, 2008.
[FBInfer2005] Berdine, Josh, Cristiano Calcagno, and Peter W. O’hearn.
"Symbolic execution with separation logic." Asian Symposium on Programming
Languages and Systems. Springer, Berlin, Heidelberg, 2005.
[FBInfer2020] O'Hearn, Peter W. "Incorrectness logic." Proceedings of the
ACM on Programming Languages 4.POPL (2019): 1-32.

Thanks,
Gabor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200203/1ab3e3f0/attachment.html>