[cfe-dev] [Analyzer][RFC] Function Summaries

Wed Feb 5 09:48:45 PST 2020

Hi!

On Wed, Feb 5, 2020 at 8:22 AM Gábor Márton <martongabesz at gmail.com> wrote:

>
> My biggest concern with APINotes is that it modifies the AST with Sema
> <https://github.com/apple/llvm-project/blob/apple/master/clang/lib/Sema/SemaAPINotes.cpp>
> and then the whole compilation chain - including CodeGen - works on the
> modified AST. With summaries we'd like to describe complex information
> (e.g. relations between parameters and the return value) that is important
> only for static analysis and that may be totally irrelevant for CodeGen or
> other parts of the compiler. Another concern is that APINotes does not
> support C++.
>

During analysis we don't do CodeGen so I do not really see the first part
as a concern. We do not load those notes during compilation and we do not
generate code when we load the notes. The lack of C++ support is definitely
a problem. The question is, is it better to improve APINotes to fit our
needs or to do something similar from scratch?

> In short term, summaries for stable libs like libc, libc++ and posix can
> be hard-coded in the source of StdLibraryFunctionsChecker. In long term, I
> think we should have a specified file format (.e.g YAML) in which we can
> describe these summaries. Summaries for stable libs (libc, libc++, posix)
> should be shipped with Clang. For extensiblity, (for library authors)
> probably we will need an option that specifies a directory from which we
> could read additional summary files. Similarly to what we already have with
> ModelInjector's `-model-path` config option.
>

Note that the current `-model-path` option is very limited. But it is a
good questions what is the best semantic for such an option. My initial
thought was to have something that behaves very similar to '-I' flags with
headers.

> I agree that the range based solver is not powerful enough. But we have
> the more powerful Z3 there, unfortunately that is slow. If we have a
> summary that has preconditions then we must check in the caller whether
> that pre is satisfied before applying the summary, and this would require
> the use of Z3. There are techniques to cache and/or decrease the problem
> space for the solver, but probably this might not scale, I agree. Another
> option could be to apply the summary without checking the pre and once we
> found a bug only then would we check the path feasibility with Z3
> (similarly to what we do today with inlining).
>

If we want to pursue Z3 for summaries we first will need to fix some of the
bugs we currently have. Unfortunately, those are not trivial. So I think if
this is our goal the first step is to make Z3 support solid.

> I'd like to avoid checker specific pre/post conditions. I hope that the
> summaries could consist of local path constraints of the function and once
> the summary is applied then the checker could populate its own GDM based on
> that.
>

I do see some problems with this. For example, we do have modeling checks
that can guide the analyzer by adding/removing constraints. Leaving those
checks out from summary generation can have its costs in terms of
precision. Also, applying constraints, the checkers will miss crucial
events for populating gdm. Such as checkers that subscribed for certain
events like checkPostStatement, checkRegionChanges and so on. If I
understand your proposal correctly, from the checker's point of view we
would evaluate the call conservatively, and we would only use the pre/post
states to help pruning more infeasible paths. In case we can avoid checkers
being overly confident (see the inline defensive checks problem) this might
work but we will see.

Cheers,
Gabor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200205/94973c56/attachment.html>