[PATCH] D67706: [clang][analyzer] Using CallDescription in StreamChecker.

Wed Oct 30 16:04:47 PDT 2019

NoQ accepted this revision.
NoQ added a comment.

Let's land this patch, given that it grew way too big for a refactoring pass, and let @balazske further clean things up later in his follow-up patches. This is already a large improvement and i'm very grateful!

For the `Optional` vs. `bool` debate: let's see. For now we have roughly the following:

  bool checkNullStream(..., ProgramStateRef *State) {
    if (!NonBuggyState && BuggyState) {
      C.emitReport(..., C.generateErrorNode(BuggyState));
      return false;
    }

    if (NonBuggyState) {
      *State = NonBuggyState;
      return true;
    }

    return false;
  }

  void checkFseekWhence(..., ProgramStateRef *State) const {
    if (!isBuggy(State))
      return;

    C.emitReport(..., C.generateNonFatalErrorNode(State));
  }

  void evalFseek(...) {
    ProgramStateRef State = C.getState();
    bool StateChanged = checkNullStream(..., &State);
    if (C.isDifferent())
      return;

    checkFseekWhence(..., &State);
    if (!C.isDifferent() && StateChanged)
      C.addTransition(State);
  }

In this code functions `checkNullStream` and `checkFseekWhence` are very similar. Both of them do some checking and potentially emit a report. In high-level terms, all you're trying to do is conduct two checks at the same program point.

How did these two functions end up being so completely different? What will you do if you have not 2 but, say, 10 different checks that you need to conduct? That's not a random question; checkers do tend to grow this way, so i'm pretty sure you'll eventually need to answer this question.

In `ExprEngine`, the class that's responsible for modeling all statements that have effect on the program state, we have a very common idiom for chaining pieces of work together. It looks roughly like this:

  void modelSomethingPart1(ExplodedNode *Pred, set<ExplodedNode> *Succs);
  void modelSomethingPart2(ExplodedNode *Pred, set<ExplodedNode> *Succs);
  void modelSomethingPart3(ExplodedNode *Pred, set<ExplodedNode> *Succs);
  ...
  void modelSomethingPartK(ExplodedNode *Pred, set<ExplodedNode> *Succs);

  void modelSomething(ExplodedNode *Pred) {
    set<ExplodedNode> Set0 = { Pred };

    set<ExplodedNode> Set1;
    for (P : Set0)
      modelSomethingPart1(P, &Set1);

    set<ExplodedNode> Set2;
    for (P : Set1)
      modelSomethingPart2(P, &Set2);

    set<ExplodedNode> Set3;
    for (P : Set2)
      modelSomethingPart3(P, &Set3);

    ...

    set<ExplodedNode> SetK;
    for (P : SetKMinus1)
      modelSomethingPartK(P, &SetK);

    for (N : SetK)
      addTransition(N);
  }

As you see, it scales nicely for any number of sub-tasks and you don't need to juggle boolean flags or optionals when you add another sub-task. Moreover, you can further split each of your sub-tasks into even smaller sub-sub-tasks in a similar manner.

In `ExprEngine` this idiom is completely essential due to how arbitrary most of the effects-of-statements tend to be. In checkers, however, this level of complexity has never been necessary so far because most sub-methods either produce exactly one new state, or sink completely. For checkers like this one, which do multiple checks, i recommend passing a single node around:

  ExplodedNode *checkNullStream(..., ExplodedNode *Pred) {
    if (!NonBuggyState && BuggyState)
      C.emitReport(..., C.generateErrorNode(BuggyState, Pred));

    return C.addTransition(NonBuggyState, Pred);
  }

  ExplodedNode *checkFseekWhence(..., ExplodedNode *Pred) const {
    State = Pred->getState();
    if (!isBuggy(State))
      return Pred;
    ExplodedNode *N = C.generateNonFatalErrorNode(State);
    if (N)
      C.emitReport(..., N);
    return N;
  }

  void evalFseek(...) {
    ExplodedNode *N = C.getPredecessor();

    N = checkNullStream(..., N);
    if (!N)
      return;

    N = checkFseekWhence(..., N);
    if (!N)
      return;

    N = checkSomethingElse1(..., N);
    if (!N)
      return;

    N = checkSomethingElse2(..., N);
    if (!N)
      return;

    ...

    checkSomethingElseK(..., N);
  }

This wastes exploded nodes a little bit but that's fine. It's much less scary than accidentally adding unnecessary state splits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67706/new/

https://reviews.llvm.org/D67706