[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

Wed Dec 4 15:36:07 PST 2019

NoQ added a comment.

A while back I had a look into implementing some of the CERT checkers and my vague recollection from these attempts was that it's a bad idea to take CERT examples literally. Most of the CERT rules are explained in an informal natural language and it makes CERT great for demonstrating common mistakes that programmers make when they write in C/C++. Humans can easily understand the problem by reading CERT pages. But it doesn't necessarily mean that static analysis tools can be built by generalizing over the examples from the CERT wiki.

So i suggest that we make one more step back and agree upon what exactly are we checking here. I.e., basically, agree on the tests. Because right now it looks like you're trying to blindly generalize over the examples: //"The CERT example has a `strcpy()` into a fixed-size buffer, let's warn on all `strcpy()`s into fixed-size buffers!"//. This way your check does indeed pass the tests, but it doesn't make sense both from the rule's perspective (the rule never said that it's wrong to `strcpy()` into a fixed-size buffer, it only said that you should anyhow guarantee that the storage is sufficient) and from the user's perspective (you won't be able to reduce the gap between false positives and false negatives enough for the check to be useful, even with the subsequent whack-a-mole of adding more heuristics, because what the check is checking is relatively orthogonal to the actual problem you're trying to solve).

For example, in the example titled "Noncompliant Code Example (argv)", it is explicitly stated that the problem is not only that the buffer is fixed-size and `strcpy()` is made, but also that the original string //is controlled by the attacker//. The right analysis to catch such issues is //taint analysis//. It's a typical taint problem. I honestly believe you should try to solve it by combining the taint checker with the string checker: "if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn". With the recent advancements in the development of the taint checker, i think you can get pretty far this way without constantly struggling with false positives.

================
Comment at: clang/include/clang/StaticAnalyzer/Checkers/Checkers.td:801
+  Dependencies<[StrCheckerBase]>,
+  Documentation<HasDocumentation>;
+
----------------
Charusso wrote:
> In my mind the documentation is the CERT rule's page. Is it okay?
I'd prefer our own documentation that may link to the CERT page but should also explain, say, which parts of the rule are covered by the checker (even if it's "all of them", they may add more rules in the future), and probably some specifics of the checker's behavior if they aren't obvious from the rule.

================
Comment at: clang/test/Analysis/cert/str31-c-fp-suppression.cpp:33-34
+  // string, so that we could see whether the string is null-terminated on
+  // every path or not. Because the 'src' could be not null-terminated this
+  // report is fine, but if 'src' is null-terminated this report is a false
+  // positive because the 'src' fits into 'dest' with the null terminator
----------------
Mmm, no, it's not fine. The warning is saying something that's not correct. Even if `src` is not null-terminated, under the assumption that no out-of-bounds read UB occurs in the `strcpy()` call, `dest` is going to always be null-terminated and have a `strlen` of at most 127. And by continuing our analysis after `strcpy()`, we inform the user that we assume that no UB has happened on the current execution path yet.

Traditionally checkers either warn immediately when they can detect an error, or assume that the error has not happened. For example, when we dereference a pointer, if the pointer is not known to be null, we actively assume that it's non-null. Similarly, if the `src` string is not null-terminated, we should warn at the invocation of `strcpy`; if it's not known whether it's null-terminated or not, we should actively assume that it's null-terminated at `strcpy`.

Also, I usually don't agree with the statements like "This report helped us find a real bug, therefore it's a true positive". You can find a bug in literally any code if you stare at it long enough! I'm glad you found the bug anyway, but if the report is not pointing at the problem directly, we're not doing a great job.

================
Comment at: clang/test/Analysis/cert/str31-c-fp-suppression.cpp:51-53
+  strcpy(dest, src);
+  do_something(dest);
+  // expected-warning at -1 {{'dest' is not null-terminated}}
----------------
Looks like a false positive. Consider the following perfectly correct (even if a bit inefficient) code that only differs from the current code only in names and context:
```lang=c++
void copy_and_test_if_short(const char *src) {
  char dest[128];
  strcpy(dest, src);
  warn_if_starts_with_foo(dest);
}

void copy_and_test(const char *src) {
  if (strlen(src) < 64)
    copy_and_test_if_short(src);
  else
    copy_and_test_if_long(src);
}
```

================
Comment at: clang/test/Analysis/cert/str31-c-notes.cpp:22
+  if (gets(buf)) {}
+  // expected-note at -1 {{'gets' could write outside of 'buf'}}
+  // expected-note at -2 {{Assuming the condition is false}}
----------------
I still believe that this should be *the* warning in this code. This is already broken code.

================
Comment at: clang/test/Analysis/cert/str31-c-notes.cpp:26-27
+
+  free(buf);
+  // expected-warning at -1 {{'buf' is not null-terminated}}
+  // expected-note at -2 {{'buf' is not null-terminated}}
----------------
It is not wrong to free a buffer that isn't null-terminated. We shouldn't warn here.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70411/new/

https://reviews.llvm.org/D70411