[PATCH] D70411: [analyzer] CERT: StrChecker: 31.c

Artem Dergachev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Dec 9 18:04:03 PST 2019


NoQ added a comment.

Ok, so, what should the checker warn on? What should be the intended state machine for this checker?

We basically have two possible categories of answers to this question:

- "The checker warns every time it finds an execution path on which unintended behavior occurs" - describe unintended behavior (say, "out-of-bounds buffer access") and define the checker in such a way that all warnings describe actual bugs of this kind (in our example, out-of-bound buffer accesses) as long as the rest of the static analyzer works correctly.
- Or, you could say "The checker enforces a certain coding guideline on the user" - and then you should describe the guideline in a very short and easy-to-understand manner (say, "don't pass the string by pointer anywhere before you null-terminate it"), and then make checker check exactly this guideline. You may introduce some suppressions for intentional violations of the guideline, but the guideline itself should be simple, it should be always possible to follow it, and it should sound nice enough for developers to feel good about themselves when they follow it.

> - The problem is not the size, but the missing `'\0'` (which you can have multiple of at any point).

The caption of the CERT rule suggests that both of these are problematic.

In D70411#1773703 <https://reviews.llvm.org/D70411#1773703>, @Charusso wrote:

> In D70411#1769786 <https://reviews.llvm.org/D70411#1769786>, @NoQ wrote:
>
> > So i suggest that we make one more step back and agree upon what exactly are we checking here. I.e., basically, agree on the tests. Because right now it looks like you're trying to blindly generalize over the examples: //"The CERT example has a `strcpy()` into a fixed-size buffer, let's warn on all `strcpy()`s into fixed-size buffers!"//.
>
>
> Just for clarification, I made the entire `MemoryBlockRegion` stuff because I do not care how do you allocate a memory block (even a `string` is counted as that).


Yup, i mean, you can go as far as you want with generalizing over "fixed size buffer" in this approach, but what i'm saying is that you're still not addressing the whole train of thought about the length of the source string.

In D70411#1773703 <https://reviews.llvm.org/D70411#1773703>, @Charusso wrote:

> > F10967277: ftp.c_8cb07866de61ef56be82135a4d3a5b7e.plist.html <https://reviews.llvm.org/F10967277>
> >  The source string is an IP address and port, which has a known limit on the number of digits it can have.
>
> The known size does not mean that the string is going to be null-terminated.


It does, because `strcpy` is guaranteed to null-terminate as long as it has enough storage capacity.

>> For example, in the example titled "Noncompliant Code Example (argv)", it is explicitly stated that the problem is not only that the buffer is fixed-size and `strcpy()` is made, but also that the original string //is controlled by the attacker//. The right analysis to catch such issues is //taint analysis//. It's a typical taint problem. I honestly believe you should try to solve it by combining the taint checker with the string checker: "if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn". With the recent advancements in the development of the taint checker, i think you can get pretty far this way without constantly struggling with false positives.
> 
> This checker indeed would require an additional taint analysis, but as we do not have a non-alpha variant, I am fine to call every non-null-terminated string reading an issue (as it is), whatever is the source. That is a generalization, yes, in the best possible manner what we have got today. Also I am not a big fan of relying on non-alpha stuff and I am sure I have defined my checker enough well to catch real issues.

Checks that are part of the generic taint checker are currently in a pretty bad shape, but the taint analysis itself is pretty good, and checks that rely on taint but aren't part of the generic taint checker are also pretty good. I actually believe taint analysis could be enabled by default as soon as somebody goes through the broken checks and disables/removes them. If you rely on the existing taint analysis infrastructure and make a good check, that'll be wonderful and would further encourage us to move taint analysis out of alpha.

> - We heavily need to swap the value-tracking with `NoteTags` to make this useful.

Before i forget, i'm opposed to explaining problems as notes. We should emit a warning as soon as we've reached far enough on the execution path to conclude that there's a problem.



================
Comment at: clang/include/clang/StaticAnalyzer/Checkers/Checkers.td:798
+
+def Str31cChecker : Checker<"31.c">,
+  HelpText<"SEI CERT checker of rules defined in STR31-C">,
----------------
Maybe `31c`? I'm afraid a dot in package name would be confusing and people will actually try something like to enable/disable `cert.str.31`.


================
Comment at: clang/include/clang/StaticAnalyzer/Checkers/Checkers.td:803
+
+} // end "cert.str"
+
----------------
`alpha.cert.str`.


================
Comment at: clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp:23
+const char *const CXXObjectLifecycle = "C++ object lifecycle";
+const char *const SecurityError = "SecurityError";
+
----------------
These strings are visible to humans, please write some proper English :)


================
Comment at: clang/test/Analysis/cert/str31-c-fp-suppression.cpp:51-53
+  strcpy(dest, src);
+  do_something(dest);
+  // expected-warning at -1 {{'dest' is not null-terminated}}
----------------
Charusso wrote:
> NoQ wrote:
> > Looks like a false positive. Consider the following perfectly correct (even if a bit inefficient) code that only differs from the current code only in names and context:
> > ```lang=c++
> > void copy_and_test_if_short(const char *src) {
> >   char dest[128];
> >   strcpy(dest, src);
> >   warn_if_starts_with_foo(dest);
> > }
> > 
> > void copy_and_test(const char *src) {
> >   if (strlen(src) < 64)
> >     copy_and_test_if_short(src);
> >   else
> >     copy_and_test_if_long(src);
> > }
> > ```
> That is why I have asked whether we have a way to say "when the string is read". I like that heuristic for now, because any kind of not null-terminated string is the root of the evil for [[ https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator | STR31-C ]].
> 
> In this case the attacker is the programmer who sends a possibly not null-terminated string around and the function which receives an arbitrary input would carry the issue.
> 
> Thanks for the example, I wanted to cover ranges, but I have forgotten it, because in the wild peoples do not check for bounds. Copy-pasted.
Note that function `copy_and_test` in my example would not be necessarily visible to you. It may be in another translation unit.

So in order to "cover ranges" you'll have to discard all cases where the length is completely unknown.


================
Comment at: clang/test/Analysis/cert/str31-c-notes.cpp:22
+  if (gets(buf)) {}
+  // expected-note at -1 {{'gets' could write outside of 'buf'}}
+  // expected-note at -2 {{Assuming the condition is false}}
----------------
Charusso wrote:
> NoQ wrote:
> > I still believe that this should be *the* warning in this code. This is already broken code.
> No, it is not broken, sadly. I would say it is broken, but this checker tries to be smarter than `grep`. Truncation is fine for [[ https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator | STR31-C ]].
> 
> That change in my mindset made this checker possible when I have dropped the idea of warn on function calls.
> `Truncation is fine for STR31-C.`
This isn't just truncation, this is buffer overflow, and it's definitely not fine if we want to "Guarantee that storage for strings has sufficient space for character data".

This is absolutely the right place to simply warn on every invocation of `gets()`.


================
Comment at: clang/test/Analysis/cert/str31-c-notes.cpp:26-27
+
+  free(buf);
+  // expected-warning at -1 {{'buf' is not null-terminated}}
+  // expected-note at -2 {{'buf' is not null-terminated}}
----------------
Charusso wrote:
> NoQ wrote:
> > It is not wrong to free a buffer that isn't null-terminated. We shouldn't warn here.
> That is when the string is read. We could have a list of function calls which are non-problematic. For now I am fine with that note, because you do not allocate something without reading it before freeing it. It is just an arbitrary test for now.
> That is when the string is read.

That strikes me as an example of [[ https://en.wikipedia.org/wiki/Law_of_the_instrument | the Maslow's hammer ]]. The static analyzer isn't great at guessing whether the function reads from the string or not. For a dynamic analysis it wouldn't have been a problem.

That said, maybe you could get away with discriminating between `char *` and `const char *`. If the string is passed in by const pointer, there's nothing you can do except read from it. If it's passed in by non-const pointer, you can conservatively assume that it isn't read.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70411/new/

https://reviews.llvm.org/D70411





More information about the cfe-commits mailing list