[cfe-dev] [analyzer][RFC] The future of StdLibraryFunctionsChecker and other checkers responsibility
Kristóf Umann via cfe-dev
cfe-dev at lists.llvm.org
Fri Jun 26 06:53:39 PDT 2020
On Fri, 26 Jun 2020 at 09:53, Balázs Kéri <1.int32 at gmail.com> wrote:
> What I think now about this:
> It may be better but more work to make a kind of generic interface that
> supports modeling of these summaries. This interface should make
> specification and apply of summaries easy enough to use at any
> checker, extensible with checker-specific behavior. Then we can have
> specific checkers for specific groups of functions. These can model every
> aspect of the functions so that responsibility for one function is not
> spread between multiple checkers.
That sounds like a brilliant idea! It would be great if at the very least
the signature would be visible from StdLibraryFunctions. Let's take a
function that would greatly benefit from this: getline.
ssize_t getline(char **lineptr, size_t *n, FILE *stream);
"getline() reads an entire line from stream, storing the address of the
buffer containing the text into *lineptr. The buffer is null terminated
and includes the newline character, if one was found. If *lineptr is set to
NULL and *n is set 0 before the call, then getline() will allocate a buffer
for storing the line. This buffer should be freed by the user program even
if getline() failed."
This is a function that should be handled by MallocChecker (memory
allocation), StreamChecker (validity of the stream), TaintChecker (to mark
the buffer as tainted) and possibly CStringChecker (is the stream pointing
to a well known buffer?). Checking whether the stream is null or not could
be done anywhere, but it would be convenient to keep the signature/summary
combo together. Having an interface to retrieve each of the possible
post-condition states would make this even better, since if any of them
would be impossible (say, the path where feof() returns true for a
known-to-be-valid stream) the individual stream could sink that path.
Artem Dergachev via cfe-dev <cfe-dev at lists.llvm.org> ezt írta (időpont:
> 2020. jún. 25., Cs, 18:45):
>> No matter how far do we want to push the flexibility of our summaries,
>> this checker will always have limitations. I suggest we treat it as yet
>> another method of interprocedural analysis, i.e. simply an improvement over
>> conservative evaluation of the call.
>> I don't see a need for a specific one-size-fits-all policy here. We can
>> decide every time on per-function basis. Doing evalCall in two different
>> checkers is obviously bad (and we have an assertion that protects against
>> that, so it won't be "anarchy" but merely a crash, at least in non-release
>> builds but i'd expect such crash to be fairly apparent) but any other
>> separation of concerns sounds reasonable to me.
If we actually produce a crash in that case, then I guess we should use
evalCall a lot more often. Sure we miss a lot of aspects of the function
(pipe creation etc), but it's not like inlining instead would help on that
anyways, and until then, lets enjoy some less conservative invalidation and
> I suspect that if you end up setting up dependencies, it's almost always
>> better to move the functionality to one checker. But there's no need to
>> merge different aspects of modeling that are completely orthogonal. Note
>> that state splits are idempotent which means that it's often ok to
>> duplicate the post-call state-splitting code in multiple checkers without
>> making them aware of each other.
I might be overly cautious, but that I fear that still leaves us vulnerable
to hard-to-detect programming errors. I have a general gut feeling about
the analyzer that we have a lot of very similar looking functions that do
almost the same thing with subtle differences (
and I'm not immediately sure that the same exact state split would happen
in those occasions. But then again, this is just a gut feeling, nothing
> On 6/25/20 7:19 AM, Kristóf Umann via cfe-dev wrote:
>> +Balázs Kéri
>> On Thu, 25 Jun 2020 at 10:39, Gábor Márton <martongabesz at gmail.com>
>>> One of my goals with StdLibraryFunctionsChecker is to add argument
>>> constraints to functions in the C, C++ and POSIX standard. It is very easy
>>> to gather these argument constraints and apply them massively. By doing
>>> this, the analysis can be more precise, so, I see a lot of gain here.
>>> However, some of these argument constraints could be handled in more
>>> specific checkers too, e.g. in CStringModeling, MallocChecker or in
>> For those that didn't follow development closely, argument constraints in
>> this context mean the following: For select functions, we define a
>> signature we can use to match a call to this function (for C functions,
>> that is return value, function name and argument types), and a so called
>> "summary", which is a list of constraints on the arguments or the return
>> value. For a function like fread(), the signature is size_t fread(void
>> *, size_t, size_t, FILE *), and the summary is that the stream argument
>> and the buffer must be non-null, and that the buffer must be at least the
>> size of (2nd arg * 3rd arg). We can also constrain the return value, that
>> is is at least (2nd arg * 3rd arg).
>> If on the function call any of the preconditions isn't met, we can emit a
>> warning, otherwise we can increase the precision of the analysis by
>> constraining the arguments.
>> This is a form of summary based analysis, and its pretty easy to pull off
>> on standard functions because they have, well, standardized signatures and
>> (this isn't pulled from the checker, just a theoretical example)
>>> Let's take for instance `popen`. In the mentioned patch I am adding a
>>> simple summary with argument constraints. However, sooner or later popen
>>> could be modeled in StreamChecker once that is mature enough. (1) Should we
>>> remove the summary from StdLibraryFunctionsChecker when that happens? (2)
>>> Or should we have a declaration that ranges and nullability are checked
>>> with summaries while more specific things are checked in their respective
>>> checkers? (3) Or the specialized checkers could be dependencies to the
>>> generic StdLibraryFunctionsChecker, so any bug related to e.g. popen is
>>> prioritized to StreamChecker?
>> Yes, this this the big one. What role should batch modeling checkers play
>> in the analyzer alongside specific modeling checkers? Lets take an example
>> that has already presents a problem, and the most glaring of them, to me at
>> least, is stream modeling functions.
>> Take for instance the above mentioned fread. The *numerical* pre- and
>> postconditions can be enforced just as I detailed. However, the most
>> important effect of this function is of course the stream operation, what
>> happens depending on the validity of the stream object (which is not a
>> numerical property), and what happens to it after the call has finished,
>> successful or not. If the stream hits EOF during the read, the return value
>> is less then (2nd arg * 3rd arg), and is equal to it if the read was
>> successful. What this implies to me is that StdLibraryFunctionChecker
>> cannot really take on the responsibility of establishing a post condition,
>> **unless** the function is question can be described by numerical
>> properties only (like isalpha()).
>> This is something that this checker already does well I think. Its clear
>> that we don't want to expand it to cover non-numeric cases. So, onto the
>> topic that brought us here.
>> (2.) This sounds great. I think precondition checking is something that
>> introduces a lot of code duplication in checkers, and I don't think that
>> fread()ing a nullpointer is a stream mismanagement issue necessarily. A
>> notable challenge to overcome, however, is how can we ask StreamChecker to
>> add NoteTags from other checkers' reports (see:
>> (1.) I think the only thing we could do with this idea is to implement an
>> incomplete post condition modeling until a checker comes along and does it
>> properly. However, even this would only make sense if we have a single
>> post-condition (https://reviews.llvm.org/D79432#inline-757474).
>> (3.) If we go with option (2), we must introduce strong dependencies so
>> that numerical preconditions are checked beforehand. However, if we
>> *duplicate* the same checking in multiple checkers and want to use
>> dependencies to make the correct code runs first, we would introduce
>> deadcode, or possibly running the same check twice, and the problem of
>> inconsistently distributing the responsibility of analysis gets even worse.
>>> My other goal would be to make it possible to add argument constraints
>>> (or branches/cases) to any library functions. These libraries may never be
>>> modeled in the analyzer. In this case the above problem is non-existent.
>>> Perhaps, we should divide the checker to two different checkers once we
>>> reach that point, shouldn't we? We are very close to
>>> rename StdLibraryFunctionsChecker to LibraryFunctionsChecker anyway by now.
>> If we want to implement summary based analysis, it would only be fitting
>> to bear the name SpeculativeSummaryBasedFunctionModeling and
>> StdSummaryBasedFunctionModeling or something, right? ;) Anyway, this is the
>> least of the worries.
>> I agree, pure functions (like the above mentioned isalpha()) is the real
>> learning opportunities for generalized summary based analysis for the time
>>> There is more. In `evalCall` if a checker returns true then that
>>> function is no longer processed by any subsequent checkers. That is the
>>> case with pure functions in StdLibraryFunctionsChecker. In this case,
>>> however, it would be anarchy if another checker also returns true from its
>>> own evalCall callback. Only one can rule. It would be great if we could
>>> enforce that no two checkers evaluate the same function this way. But I
>>> don't see any clear solution to achieve that right now, this is really
>> Yup, the problem is that checker callbacks from CheckerManager's
>> perspective (I might be wrong here!) takes a set of ingoing nodes and
>> returns a set of outgoing ones. So the solution might not be terribly
>> complicated, just ugly (large #ifs).
>> cfe-dev mailing listcfe-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-dev