[cfe-dev] [Analyzer][RFC] Function Summaries

Wed Feb 5 08:22:31 PST 2020

Hi Gábor,

Thanks for your feedback.

How does the content of the summaries you proposed compare to
> CppCheck's summaries? Is it the same or did you add/remove something?
>
I consider the annotations in CppCheck valuable, so most of the proposed
argument constraints are coming from there. The exception is the
not-tainted argument constraint, that's new compared to CppCheck. Also,
return value properties are not present in CppCheck summaries.

> Being able to write summaries in a file that is not hardcoded into the
> compiler is nice. Would APINotes work for this purpose?
>
My biggest concern with APINotes is that it modifies the AST with Sema
<https://github.com/apple/llvm-project/blob/apple/master/clang/lib/Sema/SemaAPINotes.cpp>
and then the whole compilation chain - including CodeGen - works on the
modified AST. With summaries we'd like to describe complex information
(e.g. relations between parameters and the return value) that is important
only for static analysis and that may be totally irrelevant for CodeGen or
other parts of the compiler. Another concern is that APINotes does not
support C++.

What would be the distribution model for those summaries? Would clang ship
> them? Would the library authors ship them along with their libraries? If
> the latter how would clang pick it up?
>
In short term, summaries for stable libs like libc, libc++ and posix can be
hard-coded in the source of StdLibraryFunctionsChecker. In long term, I
think we should have a specified file format (.e.g YAML) in which we can
describe these summaries. Summaries for stable libs (libc, libc++, posix)
should be shipped with Clang. For extensiblity, (for library authors)
probably we will need an option that specifies a directory from which we
could read additional summary files. Similarly to what we already have with
ModelInjector's `-model-path` config option.

What if multiple files have summaries for the same function? (In C it is
> not that hard to have a situation like this due to the lack of namespaces.)
>
Option1: Give a run-time warning, so when we discover the situation we skip
the summary and the analysis continues.
Option2: Give a run-time error and then stop the analysis, the user must
provide unambiguous summaries.
I prefer Option1.

> How widespread is CppCheck's format? Should we support that?
>
I think it is a CppCheck internal format. So, I'd rather not support a file
format that is not controlled in LLVM/Clang. I think we should support only
one file format which is easily extensible.

> While I'd really like to have summaries I think we should lower our
> expectations a bit here. The current constraint solver of the analyzer
> might not be powerful enough to handle non-trivial summaries of this form.
> I do believe it can provide some value but often those papers describing
> these summaries are using a much more powerful SMT solver as a backend.
>
I agree that the range based solver is not powerful enough. But we have the
more powerful Z3 there, unfortunately that is slow. If we have a summary
that has preconditions then we must check in the caller whether that pre is
satisfied before applying the summary, and this would require the use of
Z3. There are techniques to cache and/or decrease the problem space for the
solver, but probably this might not scale, I agree. Another option could be
to apply the summary without checking the pre and once we found a bug only
then would we check the path feasibility with Z3 (similarly to what we do
today with inlining).

> The other concern is how to model checker specific pre- and
> post-conditions. Our checker API is not written in an SMT solver friendly
> way, so it would be fairly hard to synthesize those summaries for our
> checkers.
>
I'd like to avoid checker specific pre/post conditions. I hope that the
summaries could consist of local path constraints of the function and once
the summary is applied then the checker could populate its own GDM based on
that.

> Also, I do believe the analyzer could profit a lot from some classical
> dataflow analysis like mod/ref analysis. If we would annotate the arguments
> with the results of such an analysis it could help the analyzer to
> invalidate the right amount of state when evaluating a function call
> without a body.
>
Yes, as Artem mentioned, we could get some of these annotations from the
IR, e.g. by utilizing the Attributor framework.

I think the previous plans with automatically generated summaries would
> already render the analyzer into a two-phase analysis.
>
Not necessarily (I hope). The methods described in Godefroid's work
generates the summaries on-demand when the symbolic execution first sees
the functions.

> In case we are interested in infer style summaries I think we should first
> make our memory modeling based on separation logic. I have no idea how far
> we are from that at this point and how much work would it be. Also, I
> believe reasoning about such summaries requires specialized solvers that
> are much more powerful than the current range based one.
>
Yes, indeed. That would mean developing a whole new/different kind of
reasoning for summary generation and that might be too much to deal with.
On the other hand, I've seen some very valuable thoughts in the paper about
incorrectness logic, perhaps we could reuse some of those even in the
current form of the CSA symbolic execution.

Gabor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200205/73bd2448/attachment.html>