[PATCH] D52984: [analyzer] Checker reviewer's checklist

Artem Dergachev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Oct 19 17:20:08 PDT 2018


NoQ added a comment.

My take on the out-of-alpha checklist:

- The checker should be evaluated on a large codebase in order to discover crashes and false positives specific to the checker. It should demonstrate the expected reasonably good results on it, with the stress on having as little false positives as possible.
  - The codebase should contain patterns that the checker checks. It should ideally also contain actual bugs, but it is fine if the checker was inspired by a one-of-a-kind devastating bug while there aren't many other instances of this bug around.

- Warning and note messages should be clear and easy to understand, even if a bit long.
  - Messages should start with a capital letter (unlike Clang warnings!) and should not end with `.`.
  - Articles are usually omitted, eg. `Dereference of a null pointer` -> `Dereference of null pointer`.

- Static Analyzer's checker API saves you from a lot of work, but it still forces you into a certain amount of boilerplate code when it comes to managing entities specific to your checker:
  - Almost all path-sensitive checkers need their own `BugReporterVisitor` in order to highlight interesting events along the path - or at least a call to `trackNullOrUndefValue`. //(need to wait for https://reviews.llvm.org/D52758)//
    - For example, SimpleStreamChecker should highlight the event of opening the file when reporting a file descriptor leak.
  - If the checker tracks anything in the program state, it needs to implement the `checkDeadSymbols` callback to clean the state up.

- Unless there's a strong indication that the user is willing to opt-in to a stricter checking, obvious false positives that could be caused by the checker itself should be fixed. For example:
  - If a checker tracks state of mutable objects enumerated by symbols, escaping symbols into unfamiliar functions must invalidate the state - for example, through `checkPointerEscape`.
    - For example, SimpleStreamChecker should drop its knowledge about an open file descriptor when it is passed into a function that may potentially close the descriptor. At the same time, it is fine to not escape closed descriptors.
  - If the checker is intentionally leaving a room for false positives, i.e., it checks for a coding standard that isn't critical to follow when it comes to formal correctness of the program, then there should be a cheap explicit way to suppress the warnings when the violation of the coding standard is intentional.
  - Fuzzy matching of API function or type names is generally dangerous when it may lead to false positives, fine otherwise.
    - For example, a checker for LLVM `isa<>` API shouldn't match it as `isa*`, so that it didn't accidentally make assumptions on how `isalpha` works.
    - It is perfectly fine to have some fuzziness for initial experiments in alpha - that's a great way to figure out which APIs you want to cover.

My take on the everyday checklist:

- Checkers are encouraged to actively participate in the analysis by sharing its knowledge about the program state with the rest of the analyzer, but they should not be disrupting the analysis unnecessarily:
  - If a checker splits program state, this must be based on knowledge that the newly appearing branches are definitely possible and worth exploring from the user's perspective. Otherwise the state split should be delayed until there's an indication that one of the paths is taken, or one of the paths needs to be dropped entirely.
    - For example, it is fine to eagerly split paths while modeling `isalpha(x)` as long as `x` is constrained accordingly on each path. At the same time, it is not a good idea to split paths over the return value of `printf()` while modeling the call because nobody ever checks for errors in `printf`; at best, it'd just double the remaining analysis time.
    - Caution is advised when using `CheckerContext::generateNonFatalErrorNode` because it generates an independent transition, much like `addTransition`. It is easy to accidentally split paths while using it.
      - Ideally, try to structure the code so that it was obvious that every `addTransition` or `generateNonFatalErrorNode` (or sequence of such if the split is intended) is immediately followed by return from the checker callback.
  - Multiple implementations of `evalCall` in different checkers should not conflict.
  - When implementing `evalAssume`, the checker should always return a non-null state for either the true assumption or the false assumption (or both).
  - Checkers shall not mutate values of expressions, i.e. use the `ProgramState::BindExpr` API, unless they are fully responsible for computing the value. Under no circumstances should they change non-`Unknown` values of expressions.
    - Currently the only valid use case for this API in checkers is to model the return value in the `evalCall` callback.
    - If expression values are incorrect, `ExprEngine` needs to be fixed instead.



- The following checker code patterns are not wrong but very suspicious:
  - In `State->getSVal(Region)`, `Region` is not necessarily a `TypedValueRegion` and the optional type argument is not specified.
    - It is likely that the checker may accidentally try to dereference a void pointer.
  - Checker logic depends on whether a certain value is a `Loc` or `NonLoc`.
    - It should be immediately obvious whether the `SVal` is a `Loc` or a `NonLoc` depending on the AST that is being checked.
    - Checking whether a value is `Loc` or `Unknown`/`Undefined` or whether the value is `NonLoc` or `Unknown`/`Undefined` is totally fine.
  - New symbols are constructed in the checker via direct calls to `SymbolManager`, unless they are of `SymbolMetadata` class tagged by the checker, or they represent newly created values such as the return value in `evalCall`.
    - For modeling arithmetic/bitwise/comparison operations, `SValBuilder` should be used.
  - Custom `ProgramPointTag`s are created within the checker.
    - There is usually no good reason for a checker to chain multiple nodes together, because checkers aren't worklists.
  - A `BugReporterVisitor` does something different from `if (N->getState()->get<Trait>(Key) != N->getFirstPred()->getState()->get<Trait>(Key)) emitSomeNote();`.
    - That's the easiest and safest idiom to highlight the changes in the program state.
    - It is not necessary to re-hardcode all sorts of statements on which the checker reacts into the visitor.

- Use safe and convenient APIs!
  - Always use `CheckerContext::generateErrorNode` and `CheckerContext::generateNonFatalErrorNode` for emitting bug reports. Most importantly, never emit report against `CheckerContext::getPredecessor`.
  - Prefer `checkPreCall` and `checkPostCall` to `checkPreStmt<CallExpr>` and `checkPostStmt<CallExpr>`.
  - Use `CallDescription` to detect hardcoded API calls in the program.
  - Simplify `C.getState()->getSVal(E, C.getLocationContext())` to `C.getSVal(E)`.

- Common sources of crashes:
  - `CallEvent::getOriginExpr` is nullable - for example, it returns null for an automatic destructor of a variable.
    - The same applies to all values generated while the call was modeled, eg. `SymbolConjured::getStmt` is nullable.
  - `CallEvent::getDecl` is nullable - for example, it returns null for a call of symbolic function pointer.
  - `addTransition`, `generateSink`, `generateNonFatalErrorNode`, `generateErrorNode` are nullable because you can transition to a node that you have already visited.
  - Methods of `CallExpr`/`FunctionDecl`/`CallEvent` that return arguments crash when the argument is out-of-bounds.
    - If you checked the function name, it doesn't mean that the function has the expected number of arguments!
      - Which is why you should use `CallDescription`.
  - Nullability of different entities within different kinds of symbols and regions is usually documented via assertions in their constructors.

In https://reviews.llvm.org/D52984#1258724, @MTC wrote:

> - More general coding standards.


We have https://llvm.org/docs/CodingStandards.html but yeah, having Analyzer-specific coding standards is cool. I guess i listed a few items above.



================
Comment at: www/analyzer/checker_dev_manual.html:710
+<ul>
+<li>Is there unintended branching in the exploded graph?</li>
+<li>Are there unintended sinks?</li>
----------------
xazax.hun wrote:
> NoQ wrote:
> > Most importantly, "If addTransition() with unspecified predecessor is ever executed after generateNonFatalErrorNode(), is the state split intended?"
> Is generateNonFatalErrorNode special? Two addTransition calls could have similar unintended effect, though I can imagine that being more rare. 
`generateNonFatalErrorNode()` is special because it's too counter-intuitive and hard to remember that it is in fact equivalent to `addTransition()`, i.e. it isn't "the same thing as `generateErrorNode()` just sounds less scary" :)

This becomes really weird when there are multiple checks done in callees of the checker callback, only some of which are non-fatal, followed by `addTransition()` at the end of the callback. In this case this problem makes the code a lot more complex: you need to pass the current node around through all callees.


https://reviews.llvm.org/D52984





More information about the cfe-commits mailing list