[cfe-dev] [Analyzer][RFC] Function Summaries

Thu Feb 6 07:59:18 PST 2020

>
> During analysis we don't do CodeGen so I do not really see the first part
> as a concern. We do not load those notes during compilation and we do not
> generate code when we load the notes. The lack of C++ support is definitely
> a problem. The question is, is it better to improve APINotes to fit our
> needs or to do something similar from scratch?
>
APINotes is a generic framework for attributing any functions. And the
metadata read from the .apinotes files is stored in the AST and can be
queried later as `Decl->getAttr<...>()`. This mechanism could be very
useful for the whole community, i.e. for optimization, and for other tools.
However, with analyzer specific function summaries we don't want to have
the burden of creating such a generic framework that is suitable for the
whole community. The summary based experiments are way too immature now,
and we don't want to turn our focus on creating such a generic framework.
Integrating APINotes to mainstream Clang would require coordination and
consensus within the **whole** Clang community. That would require huge
efforts from all of us and we would quickly loose the focus from the
original goal which is to experiment with summaries **within** the
analyzer. Thus, we'd like to keep the summaries describing format inside
the premises of the analyzer.
Instead of APINotes, we propose to factor out the YAML parsing part from
GenericTaintChecker into a modeling checker that would populate a GDM with
the summaries metadata. That data could be used then in any checker, e.g.
by the Taint or the StdLibraryFunctions checker.


>
>
>> In short term, summaries for stable libs like libc, libc++ and posix can
>> be hard-coded in the source of StdLibraryFunctionsChecker. In long term, I
>> think we should have a specified file format (.e.g YAML) in which we can
>> describe these summaries. Summaries for stable libs (libc, libc++, posix)
>> should be shipped with Clang. For extensiblity, (for library authors)
>> probably we will need an option that specifies a directory from which we
>> could read additional summary files. Similarly to what we already have with
>> ModelInjector's `-model-path` config option.
>>
>
> Note that the current `-model-path` option is very limited. But it is a
> good questions what is the best semantic for such an option. My initial
> thought was to have something that behaves very similar to '-I' flags with
> headers.
>
>
>> I agree that the range based solver is not powerful enough. But we have
>> the more powerful Z3 there, unfortunately that is slow. If we have a
>> summary that has preconditions then we must check in the caller whether
>> that pre is satisfied before applying the summary, and this would require
>> the use of Z3. There are techniques to cache and/or decrease the problem
>> space for the solver, but probably this might not scale, I agree. Another
>> option could be to apply the summary without checking the pre and once we
>> found a bug only then would we check the path feasibility with Z3
>> (similarly to what we do today with inlining).
>>
>
> If we want to pursue Z3 for summaries we first will need to fix some of
> the bugs we currently have. Unfortunately, those are not trivial. So I
> think if this is our goal the first step is to make Z3 support solid.
>
>
>> I'd like to avoid checker specific pre/post conditions. I hope that the
>> summaries could consist of local path constraints of the function and once
>> the summary is applied then the checker could populate its own GDM based on
>> that.
>>
>
> I do see some problems with this. For example, we do have modeling checks
> that can guide the analyzer by adding/removing constraints. Leaving those
> checks out from summary generation can have its costs in terms of
> precision. Also, applying constraints, the checkers will miss crucial
> events for populating gdm. Such as checkers that subscribed for certain
> events like checkPostStatement, checkRegionChanges and so on. If I
> understand your proposal correctly, from the checker's point of view we
> would evaluate the call conservatively, and we would only use the pre/post
> states to help pruning more infeasible paths. In case we can avoid checkers
> being overly confident (see the inline defensive checks problem) this might
> work but we will see.
>
> Cheers,
> Gabor
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200206/8e4f7348/attachment.html>