[PATCH] D72705: [analyzer] Added new checker 'alpha.unix.ErrorReturn'.

Thu May 14 01:33:56 PDT 2020

martong added a comment.

I think this is a useful concept and it can nicely complement the ongoing summary based work in `StdLibraryFunctionsChecker`. Still, in my opinion, there are more discussions and work to be done here before we get to the final implementation.

Our analyzer does not reason about all paths, we try to find only ONE path where an error occurs. This leads to a non-trivial question: Is it an error if we have some paths that do not check for the return value and at least one path where we do?
(1) One answer could be that there is bug if we find out that we use a rotten file descriptor or a nullptr. But, if we don't "use" them then that is not bug. How do we define "use"? We could say that any function that takes the return value is using that, but that may lead to false positives if we cannot see into the function body (that function may check the value). I think we should define "use" in the terms of preconditions of other functions. In the previous example (`use(*P)`) we should be able to specify that the parameter of `use` must not be null and if that condition is violated only then should we diagnose the bug. In the CERT description we see that the different functions are related. For instance, we should use `fgetc` only with a valid `FILE*` that must not be NULL. This approach is being implemented in `StdLibraryFunctionsChecker`.

(2) On the other hand, there are cases when there is no such visible data dependency between the functions. E.g. the example from ERR33-C:

  size_t read_at(FILE *file, long offset,
                 void *buf, size_t nbytes) {
    fseek(file, offset, SEEK_SET);
    return fread(buf, 1, nbytes, file);
  }

Here, there are no argument constraints to fread that may be violated by calling fseek! But, let's just remove the call of fread:

  size_t read_at(FILE *file, long offset,
                 void *buf, size_t nbytes) {
    return fseek(file, offset, SEEK_SET);
  }

The bug has gone! I suggest that we should create/describe function groups. In a group we would define those functions that are related. E.g. fseek, fread, fopen, etc. If they operate on the same stream and we can find a path where there is an fseek and then an fread without the return value checked then that is a real bug. What do you think? Maybe we should have a generic infrastructure that could be reused in StreamChecker?

> A problem with this approach is that it works only if the value (the return value of the function) is not constrained already after the function call (at least not for the error value), otherwise it will interpret (wrongly) this constrainment as result of an error check branch in the code. So it is not possible to make a branch with fixed error return value (if other checker do this this is probably bad too). And it is not possible (?) to make a new branch without constraint.
>  ...
>  For example a postCall put constraints on it if it is known that the returned value is always non-negative. Or more worse, an error path is generated when the return value is exactly the error return code. In the mentioned case if the f function is modeled and a path is added where it has return value zero the checker will "think" that there was an if (X == 0) condition in the code.

In my understanding, if the return value is already constrained (to not have the error value) on a path that means we are "ready" on that path, i.e. we don't have to check again for the return value on that path. However, there may be other paths where we have to seek for the check.

> Generally, is it a problem if there are multiple checkers that may detect same defects but with different approach? Or should the code of this checker (for example) made more complicated only to filter out cases that may be detected by other checkers too?

If we can avoid it then do it, but, in my opinion, that is not a problem. There will be one checker which finds the error first then emits an error node and then the analysis stops, the next checker will not find the same bug, because execution had already stopped. We need to define the order of the colliding checkers though. Actually, there is an example for such a case: D79420 <https://reviews.llvm.org/D79420> (Note, I am not sure about non-fatal errors, does the analysis continue in that case?)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72705/new/

https://reviews.llvm.org/D72705