[cfe-dev] [RFC][analyzer][StdLibraryFunctionsChecker] Parsing signatures?

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Tue Sep 22 13:46:21 PDT 2020


This sounds like a facility that can get fairly complicated and will 
never be completely reliable or do exactly what we want. I guess it 
could be taught to handle simple cases but the old approach will never 
really be going away.

Say, if we are to write such prototypes for C++ collection methods we'll 
probably want to completely drop template arguments because we can't 
list all possible arguments. This basically means that using the actual 
compiler to parse such prototypes will inevitably fail in this case. 
Another recurring problem with C++ is inline namespaces, say the inline 
namespace __1 that shows up in the libc++ method prototypes and should 
be actively ignored by any such system.

In C some standard functions are implemented as macros expanding to 
builtins and such builtins can potentially have more arguments than the 
function they implement (extra arguments automatically filled in by the 
macro).

I think it's better to target only plain C functions with this and do a 
completely dumb custom parser for the prototypes. Probably also drop 
support for hard-to-parse types like function pointers. Anything beyond 
that sounds questionable to me.

On 9/22/20 7:30 AM, Gábor Márton via cfe-dev wrote:
> > Why? This could simplify the type matching code in the Checker 
> extremely. Besides, whenever we reach up to a point where we can read 
> up summaries from e.g. YAML files (maybe when we merge with the 
> TaintChecker) then the user could specify the signatures as they would 
> write that in C/C++, which seems to be an ultimate convenience.
>
> Another use case could be to boost up the CallDescriptionMap by using 
> the same infrastructure. Currently we match by function names and by 
> argument numbers and this has caused bugs already.
> Imagine this:
> CallDescriptionMap<FnDescription> FnDescriptions = {
>       {{"FILE *fopen(const char *pathname, const char *mode)"}, // 
> parse and match by the full signature
>       {nullptr, &StreamChecker::evalFopen, ArgNone}},
>
> Cheers,
> Gabor
>
>
>
> On Tue, Sep 22, 2020 at 3:30 PM Gábor Márton <martongabesz at gmail.com 
> <mailto:martongabesz at gmail.com>> wrote:
>
>     Hi,
>
>     Here is an example of adding a function summary in
>     the StdLibraryFunctionsChecker:
>         // ssize_t recv(int sockfd, void *buf, size_t len, int flags);
>         addToFunctionSummaryMap(
>             "recv",
>             Signature(ArgTypes{IntTy, VoidPtrTy, SizeTy, IntTy},
>     RetType{Ssize_tTy}),
>             Summary(NoEvalCall)
>                 .ArgConstraint(ArgumentCondition(0, WithinRange,
>     Range(0, IntMax)))
>     .ArgConstraint(BufferSize(/*Buffer=*/ArgNo(1),
>     /*BufSize=*/ArgNo(2))));
>
>     Instead, I'd like to have the following in the future:
>         addToFunctionSummaryMap(
>             "recv"
>             Signature("ssize_t recv(int sockfd, void *buf, size_t len,
>     int flags);"),
>             Summary(NoEvalCall)
>                 .ArgConstraint(ArgumentCondition(0, WithinRange,
>     Range(0, IntMax)))
>     .ArgConstraint(BufferSize(/*Buffer=*/ArgNo(1),
>     /*BufSize=*/ArgNo(2))));
>
>     Why? This could simplify the type matching code in the Checker
>     extremely. Besides, whenever we reach up to a point where we can
>     read up summaries from e.g. YAML files (maybe when we merge with
>     the TaintChecker) then the user could specify the signatures as
>     they would write that in C/C++, which seems to be an ultimate
>     convenience.
>
>     To achieve this I have to parse the string given to the Signature
>     in the ASTContext of the TU that is being analyzed. I
>     am considering two options to develop this:
>     1) Seems like BodyFarm/ModelInjector does something similar (it
>     reads function bodies from model files). However, I am not sure if
>     that solution is flexible enough. Gabor, what do you think, would
>     it make sense to extend into this direction, could we handle C++
>     declarations as well? What other weak points or difficulties do
>     you see?
>     2) Maybe we could use the parser with a custom ExternalASTSource
>     implementation that could do the job. Actually, this is how LLDB
>     does it, the implementation of the ExternalASTSource interface
>     uses the ASTImporter under the hood. I am not sure if ASTImporter
>     could be used for this, but maybe some parts of it, we could reuse.
>
>     Thanks,
>     Gabor
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200922/2934f43f/attachment.html>


More information about the cfe-dev mailing list