[PATCH] D106262: [clang][analyzer] Use generic note tag in alpha.unix.Stream .

Thu Jul 29 03:47:41 PDT 2021

Szelethus requested changes to this revision.
Szelethus added a comment.
This revision now requires changes to proceed.

I like the generalization still, but I don't agree with how you retrieve the `NoteTag` message. Its the wrong way around. This is how you invoke your function:

  void StreamChecker::evalFclose(/* ... */, CheckerContext &C) const {
    ProgramStateRef State = C.getState();
    SymbolRef Sym = /* get stream object */;
    const StreamState *SS = State->get<StreamMap>(Sym);

    // early returns, asserts, etc...

    // Close the File Descriptor.
    // Regardless if the close fails or not, stream becomes "closed"
    // and can not be used any more.
    State = State->set<StreamMap>(Sym, StreamState::getClosed(Desc));

    C.addTransition(State, constructNoteTag(C, Sym, {&BT_UseAfterClose})); // <--- (#)
  }

What are you telling in `(#)`? Its confusing, no bug occurred here, what is a `BugType` doing here? Are you guessing that if a bug will occur, the bug will have that specific `BugType`? You need to go out of your way to find the mapping to `BugMessages`, read a bunch of documentation and comments to realize what the intent was. Even if its algorithmically correct, its a bit upside down.

If you passed the note message:

  C.addTransition(State, constructNoteTag(C, Sym, "Stream closed here"));

  // or from some NoteMessageKind or whatever enum:

  C.addTransition(State, constructNoteTag(C, Sym, NMK_StreamClosedHere));

you could move the logic of displaying this note only for specific kinds of `BugType`s, or a single `BugType` into `constructNoteTag`. That would be a big improvement already. The call site will be very telling of whats happening: you are constructing a `NoteTag` with the string that is passed as an argument. I realize this feels like over the top nitpicking, but easily readable code is very valuable, not to mention this is the experience I had when I read this patch for the first time.

Of course, the person who will have to debug this checker later on as to why the `NoteTag` isn't displaying for a different `BugType` to that will again have to go through that function and the comments to understand that these messages are only displayed for specific `BugTypes`, this is why my four-argument suggestion would be, in my world, the ideal solution. I don't think that adds any complexity, but would make the code a lot more readable.

================
Comment at: clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp:236-242
+  const BugMessageMap BugMessages = {
+      {&BT_FileNull, "Assuming opening the stream fails here"},
+      {&BT_UseAfterClose, "Stream closed here"},
+      {&BT_UseAfterOpenFailed, "Assuming opening the stream fails here"},
+      {&BT_IndeterminatePosition, "Assuming this stream operation fails"},
+      {&BT_StreamEof, "Assuming stream reaches end-of-file here"},
+      {&BT_ResourceLeak, "Stream opened here"}};
----------------
balazske wrote:
> balazske wrote:
> > Szelethus wrote:
> > > Well this looks odd: for both `BT_StreamEof` AND `BT_IndeterminatePosition` we want to display a `NoteTag` for a failed `fseek`, for one you print "Assuming stream reaches end-of-file here", for the other, "Assuming this stream operation fails". This seems to contradict your comment in the fseek evaluating function:
> > > 
> > > ```
> > > If fseek failed, assume that the file position becomes indeterminate in any case.
> > > ```
> > > 
> > > Also, these `BugTypes` should be responsible for the *error message*, not the `NoteTag` message. I'd prefer if we mapped an enum to these strings (`NoteTagMsgKind`?), pass that as well to `constructNoteTag`, and allow the caller to decide that for which `BugTypes` the `NoteTag` is worth displaying for.
> > > 
> > > I think such a 4-argument `constructNoteTag` would capture best what we want here.
> > It is still unclear how to model the `fseek` (and other functions). (But this is not a problem for this patch.) We can do it according the POSIX or C standard, or just by the experience that a `fseek` may fail with EOF or `ferror` or none of them. The standards are not exact but do not mention that `fseek` should cause the indeterminate flag to be set at all, or that `fseek` can cause `feof` state.
> The message for a NoteTag depends on the bug type and at this state the bug type is sufficient to determine the note message. Because it is possible to add multiple bug types to a NoteTag, passing a custom message to it can be done only by passing a BugType->Message map to the note tag. This may be unnecessary complexity for the current use case.
>It is still unclear how to model the fseek (and other functions). (But this is not a problem for this patch.) 
I agree, we can discuss that another time.

> The message for a NoteTag depends on the bug type and at this state the bug type is sufficient to determine the note message.

Aha, so the claim is that a `BugType` can unambiguously determine the `NoteTag` message. I get it: among all the `NoteTag`s this checker is capable of emitting, you only want to display the one that directly caused a specific kind of a bug. If a stream operation is done on a stream object that is already at EOF, the only noteworthy `NoteTag` could be the one where the stream reached the end of file. If the stream leaked, we only need to mention where the stream was opened.

You (even if implicitly) claim that, for example, where the stream is opened wouldn't ever be interesting to any any programming error, but stream leaking. 

What I meant in my previous comments is that this is not necessarily true, or will not stay true for long. But, lets stick with your idea now, and expand later if needed.

>Because it is possible to add multiple bug types to a NoteTag, passing a custom message to it can be done only by passing a BugType->Message map to the note tag. This may be unnecessary complexity for the current use case.

`BugType` has no `Message` field. Even if it did, it should *only* describe the error node. Its a single point in the analysis, it should not, and does not know anything about what happened in past. That's the job of bug report generation facilities to figure out.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106262/new/

https://reviews.llvm.org/D106262