[PATCH] D154838: [analyzer] Add check for null pointer passed to %p of printf family

Tue Jul 25 14:56:33 PDT 2023

NoQ added a comment.

> Should I cover non-standard (i.e., non ISO C standard) functions from the `printf_s` family? Should I cover non-standard functions like `dprintf`?

The static analyzer, unlike the compiler proper, isn't required to treat all code fairly. It's ok to have different behavior depending solely on the function's name. So if you want to support non-standard functions, and you know that they have the same portability issues, totally go for it!

> This check is part of the UnixAPIPortability checker, which is already registered. Do you suggest creating a separate checker for this check?

Probably a separate check would be better. The consequences of `malloc(0)` are likely to be much more dire than consequences of `printf("%p", 0)`, so people may want to enable/disable them separately.

> What is the correct way to update expected plists for test inputs?

The run-lines are mostly self-explanatory. Just run it through the `grep` command in the other run-line. It filters out all the non-transferable stuff. (You clearly don't want to have paths like `/Users/georgiy.lebedev/Work/...` in test files.)

Or, put your test in a separate file with a simpler run-line that doesn't additionally verify plist output. We already have enough tests for plist output.

In D154838#4532929 <https://reviews.llvm.org/D154838#4532929>, @MitalAshok wrote:

> Instead of checking for hard-coded names, you can check functions with the format(printf, x, y) <https://clang.llvm.org/docs/AttributeReference.html#format> attribute:
>
>   if (auto *Format = FD->getAttr<FormatAttr>())
>     CheckPrintfPointerConversionSpecifierNULL(C, CE, Format->getFormatIdx());

Is the behavior necessarily implementation-defined for all such functions? I'm pretty sure you can find this attribute on a project-local function that will never have more than one implementation.

================
Comment at: clang/lib/StaticAnalyzer/Checkers/UnixAPIChecker.cpp:479-480
+    const clang::Expr *arg) const {
+  ExplodedNode *N =
+      C.generateNonFatalErrorNode(nullState ? nullState : C.getState());
+  if (!N)
----------------
> What is the right way to emit a bug report? I have studied the code base and found several patterns: in some cases, an error node is simply generated, in others a new transition is also added based on the analyzed state. In some cases an expression value is also tracked.

You always need to track the value when reporting things about values, it adds notes to the report explaining why do we think the variable has that value (eg., it was assigned null or compared to null).

Most checkers need this, though for some checkers it's more important than for other checkers.

----

Whether to generate a non-fatal error node and what state to put into it, this is indeed a very case-by-case decision. In your case the behavior is not undefined, the program doesn't crash when the bug happens, it continues to run normally, so analysis should probably also continue to find more defects. There are a few unknowns about the program's behavior (in particular, we don't know what it actually printed), but we never promised to figure this out in the first place. And even if there are unknowns that matter, well, unknowns are part of life and the rest of the static analyzer is already built under the assumption that not everything is known.

If it was an actual null pointer dereference, then you'd have to terminate the analysis by generating a fatal error node. The user won't be happy to learn about a "bug" that only occurs under the assumption that the program has already crashed with null dereference. And if it doesn't *only* occur under that assumption, then we'll probably find the same bug anyway while exploring a different execution path that doesn't crash.

---

A separate question is, do we want to assume that the pointer `x` is non-null after its first use in `printf("%p", x)`? We may or may not need to assume this regardless of whether we see a path on which it's //definitely// null. Eg., consider:
```lang=c
void foo(int *x) {
  // (1) We probably can't want to warn here, because there's no indication that 'x' can be null.
  printf("%p\n", x);

  // This is an indication that 'x' can be null.
  //
  // Technically it applies retroactively to the first line, but retroactive warnings are hard
  // because multiple different paths could reach this check, and it's not necessarily there
  // specifically for our current path.
  if (x == NULL) {
    printf("WARNING: null passed to foo()!");
  }

  // (2) Now, do we want to warn here? We probably do.
  //
  // But note that if we reached (2), we've already exhibited the same non-portable behavior
  // at (1) where no warning was emitted.
  printf("%p\n", x);

  // (3) Do we want to warn again here?
  printf("%p\n", x);

  // (4) What if it was a plain null dereference?
  printf("%d\n", *x);
}
```

If at (1), (2), (3) instead of `printf("%p, x)` we had a plain null dereference `printf("%d\n", *x);`, the null dereference checker will not emit any warnings. Indeed, the true-branch of the if-statement is dead code because the only way to reach it would be to go through undefined behavior at (1), but we also can't prove that there's any undefined behavior at (1). Yes we could separately warn about dead code, but that's a job for a different checker.

The null dereference checker accomplishes this by invoking `addTransition(nonNullState)` regardless of whether the warning is emitted (i.e. whenever `notNullState` is an actual state), and making the `generateErrorNode(nullState)` fatal. In particular, after (1) the pointer will continue to be unconditionally non-null.

In the example as written, you probably want a warning at (2). You also really want a null dereference warning at (4). So it's really counter-productive to transition to the `notNullState` at (1).

You probably don't care all that much about (3). Indeed, if you reach (3) in this analysis, you'll probably also reach it once (2) is addressed. The value of emitting all warnings at once is usually secondary to the nuisance of having too many warnings at once, so we'd probably still prefer it to warn only at (2).

But in order to have a warning at (4), you will also need to allow a warning at (3). This is because //what has been assumed, cannot be unassumed//. You can suppress the warning at (3) artificially by tracking a set of values or variables about which your checker has already warned, but that'd require a bit of work and I don't think it's necessary. It's probably ok to just warn at (3).

So I think your solution is correct.

================
Comment at: clang/lib/StaticAnalyzer/Checkers/UnixAPIChecker.cpp:508-512
+    if (arg->isNullPointerConstant(C.getASTContext(),
+                                   Expr::NPC_ValueDependentIsNull)) {
+      ReportPrintfPointerConversionSpecifierNULL(C, nullptr, arg);
+      return;
+    }
----------------
This really doesn't accomplish anything, `assume()` already does a lot more than that. Just rely on `assume()`, or make it a completely path-insensitive check if you want to stick to literal constants.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154838/new/

https://reviews.llvm.org/D154838