[cfe-dev] [Analyzer][RFC] Function Summaries

Wed Feb 5 09:11:51 PST 2020

Hi Artem,

Thanks for your feedback.

When doing a file format for summaries, please try to somehow coordinate
> with the current work on taint checker yaml files. I don't want multiple
> incompatible formats to be circulating around :)
>
Sure, I am in sync with Balazs :)

> One of the fresher thoughts that i've had recently on automatically
> generated summaries was re-using The Attributor
> (https://llvm.org/devmtg/2019-10/talk-abstracts.html#tech24) to produce
> function summaries that the static analyzer can import and rely upon.
>
Yes, absolutely. I consider this approach very valuable. I think we should
continue the work towards this direction after we are done with the
extension of StdLibraryFunctionsChecker and before starting to experiment
with Godefroid's methods.

> If you go for automatically generated summaries, please keep in mind the
> following potential problems:
>
> (a) We do not want checker API to become significantly more complicated
> than it currently is. It is very likely that generating checker-specific
> summaries of a function would require very non-trivial checker-side
> support that'll be too hard for a typical checker developer to handle.
> Please talk to me once you get there; i'll try to throw in a few
> examples to see if your approach is lightweight and general enough.
>
I'd like to avoid checker specific pre/post conditions. I hope that the
summaries could consist of local path constraints of the function and once
the summary is applied then the checker could populate its own GDM based on
that.

> (b) Please understand the problem of "inlined defensive checks" that we
> have with inlining-based IPA. Basically, when modeling a function call,
> you need to have a good intuition about whether multiple branches within
> the function imply multiple branches in the caller context. Like, it's
> fine to warn that passing a null pointer into foo() causes a null
> dereference, but it's not fine to emit null dereference warnings upon
> dereferencing a pointer simply because that pointer was previously
> passed into function bar() that checked it for null but has long exited
> since then. Most of the time the ideal solution is to have exactly one
> branch in the summary. Functions like `isalpha` are a rare exception.
> Most of the time the state split is not justified because there's no
> indication on the caller side that a certain callee branch is
> necessarily taken, so we should not keep these assumptions in the state
> after the function has exited, so they shouldn't be part of the summary.
>
Ok, this seems to be an interesting problem. I think if we have a summary
for bar() then after we applied that summary then that becomes part of the
path constraint and that constraint should be checked before we apply the
summary for foo(). This implies that we should have a proper precondition
for foo() as well, which is not clear for me how could we deduce that pre
automatically. Also, perhaps we could apply the summary without checking
the pre and once we found a bug only then would we check the path
feasibility with Z3 (similarly to what we do today with inlining).

> In other news - did you ever consider using ASTImporter for generating
> function body ASTs from hand-written code? Like BodyFarms but without
> constructing the AST by hand. I imagine that the advantage here is that
> the solution is more reliable than ASTImporter-based CTU because you can
> easily make sure that a certain hand-written code under your control is
> always imported correctly (as opposed to arbitrary code written by the
> users). This is probably not that hard to implement and that'd speed up
> our progress on supporting functions that generally benefit from body
> farms, such as atomic intrinsics. I.e.:
>
>    const Stmt *CompareAndSwapBody = ASTImporter.import("""
>      bool compare_and_swap(int *p, int old, int new) {
>      if (*p != old)
>          return false;
>      *p = new;
>      return true;
>    }
>    """);
>
> ?
>
Ok, this could be another approach, but very similar to the work that have
been done by Gabor(xazax) in ModelInjector and by this patch
<https://reviews.llvm.org/D13731>which builds on that. However, I see a few
concerns: (1) This requires that summaries are syntactically valid C/C++
code, thus we must use __attributes__ or [[attributes]]. Meaning we must
extend the generic Attr.td very often with analyzer exclusive data, and I
think we'd have a hard time to get those changes accepted there. Also I am
not sure if we could express easily everything we want to with these
attributes, e.g. what about relations between two arguments? (2) In the
past some developers were strictly against this approach
<http://lists.llvm.org/pipermail/cfe-dev/2015-December/046378.html> . (3) A
DeclContext can be set up to have an external source and then later this
source could be queried to extend the list of the contained Decls. This
mechanism is used in LLDB when we parse an expression. There we import
everything into the context of the expression. But in the approach you
mention, the direction of the import would be reversed, and I see some
technical difficulties in that. I am not sure if ASTImporter and
DeclContext is prepared at the moment to realize this unless the summary
code is a self contained code (meaning it can be parsed in it's own).
So, I'd rather strive for the Attributor/IR based approach after we pimped
up StdLibraryFunctionsChecker (and after we came up with a nice YAML format
or something).

Thanks,
Gabor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200205/c1722ee3/attachment-0001.html>