<div dir="ltr">Hi,<br><br>This year we plan to experiment with function summaries in CSA. We have a budge roughly for 2 people to work on this.<br><br>Short-term plans:<br>^^^^^^^^^^^^^^^^^<br>We'd like to be able to read and use hand-written function summaries. A good example of such hand-written summaries are in <a href="https://github.com/danmar/cppcheck/blob/master/cfg/std.cfg">Cppcheck</a>. The existing StdLibraryFunctionsChecker is the entry point for our upcoming changes. I have created a patch which eliminates the macros in that checker, this way the used classes like `ValueRange` become extensible [<a href="https://reviews.llvm.org/D73897">https://reviews.llvm.org/D73897</a>]. E.g. we can add an argument constraint that must be true on all branches, if the constraint fails then we warn [<a href="https://reviews.llvm.org/D73898">https://reviews.llvm.org/D73898</a>]. (Note that these patches are still WIP.)<br><br>What could be in a summary?<br>- Branches<br>  * E.g. for f(x), if x is in ['0', '9'] then ret is not 0.<br>  * This is already handled in StdLibraryFunctionsChecker.<br>- Argument constraints (a contsraint that must be true on all branches)<br>  * Range constraint: E.g. x must be in [0, 255] on all branches. This is the<br>    case for instance with isalpha(x) where if x is out of the range then the<br>    program is ill-formed (undefined behaviour) according to the C standard.<br>  * Size requirements<br>    E.g.: asctime_s(char *buf, rsize_t bufsz, const struct tm *time_ptr);<br>    `buf` size must be at least `bufsz`.<br>  * Not-null<br>  * Not-uninitalized<br>  * Not-tainted (?)<br>- Return value properties (?)<br>  * E.g.: the return value must be checked always.<br>    void foo() {<br>      void *P = aligned_alloc(4, 8);<br>      if (P == NULL) { // OK<br>      }<br>    }<br>    void bar() {<br>      void *P = aligned_alloc(4, 8);<br>      void *X = P + 1; // expected-warning{{Use of return value that was not checked}}<br>    }<br>    Storing this information in summaries could provide configuration for checkers like this <a href="https://reviews.llvm.org/D72705">https://reviews.llvm.org/D72705</a>.<br>    Also <a href="https://llvm.org/devmtg/2019-04/slides/Lightning-Balogh-Statistics_based_checkers_in_the_Clang_Static_Analyzer.pdf">statistics based checkers</a> could also benefit from this information.<br><br>Perhaps we should come up with a summary description file format (e.g. json or<br>whatever) that could be parsed by the future variant of the existing<br>StdLibraryFunctionsChecker.<br><br>After we have a strong foundation of handling existing summaries then we can<br>move on to generate summaries automatically. We do believe that summary-based<br>analysis could have a strong added value only if we are capable of<br>automatically synthesizing the summaries.<br><br>Mid-term plans:<br>^^^^^^^^^^^^^^^<br>Automatically create summaries based on the methods described in SMART and Microsoft PEX [Godefroid2007, Godefroid2008].<br>The basic idea is that we create summaries on-demand during the analysis and then later we try to reuse them if the calling-context allows that.<br>This is something like an advanced caching of the effects of function calls.<br>A summary is in the form [pre]f(x)[post], where `pre` is the preconditions over function inputs and `post` is postconditions over function outputs.<br>There are many open questions and concerns here. E.g. about performance: probably we should use an SMT solver to decide whether we can apply a summary.<br>Also, the mentioned publications do runtime execution to drive the symbolic execution towards certain paths.<br>Naturally we cannot do that, but we hope that the worklist based driving could have a similar role.<br><br><br>Long-term plans:<br>^^^^^^^^^^^^^^^^<br>Create summaries based on the ideas in Facebook Infer's publications [FBInfer2005, FBInfer2020].<br>The basic idea here is to create summaries the are free from the calling-context.<br>This can be done only with a special data-flow computation and with a proofing system.<br>This solution would render the analysis to a two-phase analysis.<br><br><br><div>References:<div>^^^^^^^^^^^<br>[Godefroid2007] Godefroid, Patrice. "Compositional dynamic test generation." Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 2007.<br>[Godefroid2008] Anand, Saswat, Patrice Godefroid, and Nikolai Tillmann. "Demand-driven compositional symbolic execution." International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, Berlin, Heidelberg, 2008.<br>[FBInfer2005] Berdine, Josh, Cristiano Calcagno, and Peter W. O’hearn. "Symbolic execution with separation logic." Asian Symposium on Programming Languages and Systems. Springer, Berlin, Heidelberg, 2005.<br>[FBInfer2020] O'Hearn, Peter W. "Incorrectness logic." Proceedings of the ACM on Programming Languages 4.POPL (2019): 1-32.</div><div><br></div><div>Thanks,</div><div>Gabor<br><br></div></div></div>