[PATCH] D75045: [analyzer] Improved check of `fgetc` in StreamChecker.

Kristóf Umann via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Feb 25 04:42:28 PST 2020


Szelethus added a comment.

In D75045#1891064 <https://reviews.llvm.org/D75045#1891064>, @balazske wrote:

> In D75045#1891056 <https://reviews.llvm.org/D75045#1891056>, @NoQ wrote:
>
> > In D75045#1891055 <https://reviews.llvm.org/D75045#1891055>, @NoQ wrote:
> >
> > > How many such platforms can you name?
> >
> >
> > I mean, does Clang even support such targets?
>
>
> The problem can occur with the "wide" variants of the getc functions when `wchar_t` is same width as `wint_t`, this can be more often (but I am not sure). Even then the character set should contain a character similar to `WEOF`, that is not true for UTF-16 and 32. So there may be low probability for the problem to occur. Is this reason to not have this check at all?


If we can detect that the code is non-portable, it doesn't really matter in my opinion if we don't support the target on which the problem would occur. Also, I can't come up with a specific platform that would have an issue like this (maybe in a dishwasher? (: ), but CERT is known to create rules based on real vulnerabilities. A quick glance lead me to this page, though these vulnerabilities may not be directly related to this rule:

https://wiki.sei.cmu.edu/confluence/display/c/AA.+Bibliography#AA.Bibliography-ISO/IECTS17961-2013

In D75045#1891032 <https://reviews.llvm.org/D75045#1891032>, @balazske wrote:

> An improvement can be made if the check runs only when size of `char` and `int` are equal.


Since the rule specifically states that the problem here is portability related, I think we shouldn't do that.

> The code can be improved somewhat but there are now too many special cases and there is already a default eval function. Probably it is better to have every "check" function take and return `ExplodedNode` instead of state.

This really ties into my main concern, that this **isn't** a stream management function specific problem, but rather an unchecked stream problem **after** we know that it returns `EOF`. I think we should first emit warnings on operations on `EOF`-returning streams, and implement this rule after that. WDYT?



================
Comment at: clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp:255-261
+  if (EOFv == 0) {
+    if (const llvm::Optional<int> OptInt =
+            tryExpandAsInteger("EOF", C.getPreprocessor()))
+      EOFv = *OptInt;
+    else
+      EOFv = -1;
+  }
----------------
balazske wrote:
> Szelethus wrote:
> > I'm 99% sure that the `Preprocessor` is retrievable in the checker registry function (`register*Checker`), and this code should be moved there, and `EOFv ` be made non-mutable. 
> I am not sure if this would be good. The `Preprocessor` is not available easily and the "register" function is probably there to register a checker and do nothing else. Probably the source code is not even known when it is called. And how to pass the EOF value to the created checker in the register function (if not a static variable is used)?
`CheckerManager::registerChecker` in fact returns the non-const checker object. The registry function is responsible for the registration //and// initialization of the checker object.

Though you have a point in the sense that this isn't terribly well defined (I should really finish D58065...).

I had a look, and my usual motto of //"Any manager can be retrieved in clang at arbitrary locations if you try hard enough"// seems to have been wrong. I don't see how I could get my hands on a `Preprocessor` in the checker registry function. So, feel free to disregard this comment.


================
Comment at: clang/test/Analysis/stream.c:188
+    C = fgetc(fp);
+    fread(0, 0, 0, fp); // expected-warning {{Should call}}
+    fread(0, 0, 0, fp);
----------------
balazske wrote:
> Szelethus wrote:
> > Are we sure that this is the error message we want to emit here? Sure, not checking whether the stream is in an error state is a recipe for disaster, but this seems to clash with
> > > Should call 'feof' and 'ferror' after failed character read.
> > as error here is not even checking the return value. Shouldn't we say
> > > Should check the return value of `fgetc` before ...
> > instead?
> No, it is not enough to check the return value of `fgetc`, exactly this is the reason for the checker. "Should call 'feof' and 'ferror' after failed character read." is more exact: The read was suspected to fail because EOF was returned but should call feof (...) to verify it. A better message can be:
> 
> If 'fgetc' returns EOF call 'feof' and 'ferror' to check for error.
The compliant solution in the rule description has the following example:

```lang=c++

#include <stdio.h>
 
void func(void) {
  int c;
 
  do {
    c = getchar();
  } while (c != EOF);
  if (feof(stdin)) {
    /* Handle end of file */
  } else if (ferror(stdin)) {
    /* Handle file error */
  } else {
    /* Received a character that resembles EOF; handle error */
  }
}
```
because if the character resembles `EOF` that is an error too, and also

> Calling both functions on each iteration of a loop adds significant overhead, so a good strategy is to temporarily trust EOF and WEOF within the loop but verify them with feof() and ferror() following the loop.

Which makes me thing that the error here is not checking against `EOF`, first and foremost.

The warning message

>If 'fgetc' returns EOF call 'feof' and 'ferror' to check for error.

sounds great, I would just prefer to see it //after// we **know** that the stream returned `EOF`.


================
Comment at: clang/test/Analysis/stream.c:246-247
+    if (C == EOF) {
+      feof(fp);
+      ferror(fp);
+    }
----------------
balazske wrote:
> Szelethus wrote:
> > I bet this isn't how we envision how these function to be used. Maybe `ReturnValueChecker` could be deployed here? In any case, that would be a topic of a different revision.
> If `fgetc` returns non-EOF we can be sure that there is no error, so no need to call `ferror` and `feof`. If it returns EOF we must "double-check" that it was a real error or an EOF-looking character that was read.
Yea, but shouldn't we check the return values of `ferror` and `feof`? Otherwise whats the point?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75045/new/

https://reviews.llvm.org/D75045





More information about the cfe-commits mailing list