[PATCH] D78374: [Analyzer][StreamChecker] Added evaluation of fread and fwrite.

Kristóf Umann via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed May 13 05:21:45 PDT 2020


Szelethus requested changes to this revision.
Szelethus added a comment.
This revision now requires changes to proceed.

I'm sorry for the late review -- please know that this isn't the first time me taking a look, this is a complex issue. I find navigating your phabricator comments a bit difficult, it would help a lot if you would accompany them with a lot of code examples, and a lot more tests in the actual patch.

Error handling is super annoying with streams. Take a look at this:

  #include "stdio.h"
  
  int main() {
    FILE *F = fopen("test.cpp", "r");
    putc('c', F);
    printf("error: %i\n", ferror(F));
    char Buf[1024];
    const int READ_COUNT = 5;
    if (READ_COUNT != fread(Buf, sizeof(char), READ_COUNT, F))
      printf("eof: %i\n", ferror(F));
    else
      printf("%s\n", Buf);
    fclose(F);
  }

  $ clang++ test.cpp && ./a.out 
  error: 1
  #incl

Should we allow this? It seems like the error flag was set, but the actual problem is unrelated to the read, so the read still works fine.

I feel like we're constantly changing what we're shooting for, and that implies immediate code changes. Let's take a step back, draw a state machine (no error, feof, ferror, indeterminate state), and just define what is an error, what is a code smell, what is perfectly fine, because we don't seem to be on the clear on this. We need to follow and agree on the C standard in a very literal sense, and we're not there just yet. The amount and placement of the state splits seem to cause a lot of confusion and we're not ready for that discussion just yet.

D70470 <https://reviews.llvm.org/D70470> is a great example for a patch with a state machine.

In D78374#2023195 <https://reviews.llvm.org/D78374#2023195>, @balazske wrote:

> Pre-statement checks for `fread` and `fwrite` filter out the cases when the position is "indeterminate", because in those states it is fatal error to call the function. Otherwise if not only the **EOF** flag is set initially (at start of `fread`) in `ErrorState` the `fread` should not generate state when other errors are possible (or none) (this contains an execution where initially EOF is set, after the function no **EOF** is set or **FERROR** is set). So the **EOF** initial state must be handled separately, probably split from the rest of the error state.


I read through this many times but I just don't understand what you mean, could you rephrase this please?

In D78374#2031694 <https://reviews.llvm.org/D78374#2031694>, @balazske wrote:

> Added state split before fread to make warning for error state possible.


Is there a significant gain behind doing that instead of doing a state split immediately? It feels unnatural to delay the split. It doesn't accurately represent how the code would run. Are we supposed to do this at every precall? What about other checkers that may rely on this one?



================
Comment at: clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp:37
+/// Note: A stream has either FEOF or FERROR set but not both at the same time.
+struct StreamErrorState {
+  /// There is an execution path with none of the error flags set.
----------------
balazske wrote:
> Szelethus wrote:
> > We're not describing the error state of a stream here, but rather //possible// error states, so the name should reflect that.
> I do not think that `StreamPossibleErrorState` is better, it is anyway a //state// that can contain any information. It is an error related state even if it defines not one exact error.
Alright, you convinced me.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp:107
+  /// This value applies to all execution paths in ErrorState except the FEOF
+  /// case. In more detail, an EOF+indeterminate state is the same as EOF state.
+  bool FilePositionIndeterminate = false;
----------------
The standard suggests that this happens on FERROR, not FEOF.

> If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. If a partial element is read, its value is indeterminate.

It would be wild if the stream wouldn't set the FEOF flag after reaching the end of the file. Also, I would imagine FEOF being usually implemented with the file position indicator.

```lang=bash
$ cat test.cpp`
```
```lang=cpp
#include "stdio.h"

int main() {
  FILE *F = fopen("test.cpp", "r");
  char Buf[1024];
  const int READ_COUNT = 99999;
  if (READ_COUNT != fread(Buf, sizeof(char), READ_COUNT, F)) {
    printf("%i\n", feof(F));
  }
}
```
```lang=bash
$ build/bin/clang++ test.cpp && ./a.out 
1

```

Operations on an EOF and indeterminate stream are very different.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78374/new/

https://reviews.llvm.org/D78374





More information about the cfe-commits mailing list