[PATCH] D70411: [analyzer] CERT: StrChecker: Implementing most of the STR31-C

Csaba Dabis via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Dec 10 19:49:21 PST 2019


Charusso added a comment.

In D70411#1776460 <https://reviews.llvm.org/D70411#1776460>, @NoQ wrote:

> Ok, so, what should the checker warn on? What should be the intended state machine for this checker?
>
> We basically have two possible categories of answers to this question:
>
> - "The checker warns every time it finds an execution path on which unintended behavior occurs" - describe unintended behavior (say, "out-of-bounds buffer access") and define the checker in such a way that all warnings describe actual bugs of this kind (in our example, out-of-bound buffer accesses) as long as the rest of the static analyzer works correctly.
> - Or, you could say "The checker enforces a certain coding guideline on the user" - and then you should describe the guideline in a very short and easy-to-understand manner (say, "don't pass the string by pointer anywhere before you null-terminate it"), and then make checker check exactly this guideline. You may introduce some suppressions for intentional violations of the guideline, but the guideline itself should be simple, it should be always possible to follow it, and it should sound nice enough for developers to feel good about themselves when they follow it.


If I am right you are thinking about warn on the unsafe function calls which is the first case. I did that on by default. The idea is that, in a security manner we cannot allow hand-waving fixes. To measure stuff we could introduce options which serves the second case. I have added an option to suppress that common null-termination-by-hand idiom, so that I do not recommend anything. The CERT rules are not recommendations so I think now the warning is fine.

> Yup, i mean, you can go as far as you want with generalizing over "fixed size buffer" in this approach, but what i'm saying is that you're still not addressing the whole train of thought about the length of the source string.

I see, and you are right, but we are getting closer and closer to catch the most of.

> In D70411#1773703 <https://reviews.llvm.org/D70411#1773703>, @Charusso wrote:
> 
>> > F10967277: ftp.c_8cb07866de61ef56be82135a4d3a5b7e.plist.html <https://reviews.llvm.org/F10967277>
>> >  The source string is an IP address and port, which has a known limit on the number of digits it can have.
>>
>> The known size does not mean that the string is going to be null-terminated.
> 
> 
> It does, because `strcpy` is guaranteed to null-terminate as long as it has enough storage capacity.

It does not, because the storage capacity is arbitrary. If we would be sure the **copied stuff's length** cannot be larger than the destination's arbitrary capacity, we would not warn. There are tests for that case.

> Checks that are part of the generic taint checker are currently in a pretty bad shape, but the taint analysis itself is pretty good, and checks that rely on taint but aren't part of the generic taint checker are also pretty good. I actually believe taint analysis could be enabled by default as soon as somebody goes through the broken checks and disables/removes them. If you rely on the existing taint analysis infrastructure and make a good check, that'll be wonderful and would further encourage us to move taint analysis out of alpha.

I think it is too far away, sadly. I like the idea because it would be the root of security checkers, but I am mostly interested in data-flow analysis.



================
Comment at: clang/test/Analysis/cert/str31-c-fp-suppression.cpp:51-53
+  strcpy(dest, src);
+  do_something(dest);
+  // expected-warning at -1 {{'dest' is not null-terminated}}
----------------
NoQ wrote:
> Charusso wrote:
> > NoQ wrote:
> > > Looks like a false positive. Consider the following perfectly correct (even if a bit inefficient) code that only differs from the current code only in names and context:
> > > ```lang=c++
> > > void copy_and_test_if_short(const char *src) {
> > >   char dest[128];
> > >   strcpy(dest, src);
> > >   warn_if_starts_with_foo(dest);
> > > }
> > > 
> > > void copy_and_test(const char *src) {
> > >   if (strlen(src) < 64)
> > >     copy_and_test_if_short(src);
> > >   else
> > >     copy_and_test_if_long(src);
> > > }
> > > ```
> > That is why I have asked whether we have a way to say "when the string is read". I like that heuristic for now, because any kind of not null-terminated string is the root of the evil for [[ https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator | STR31-C ]].
> > 
> > In this case the attacker is the programmer who sends a possibly not null-terminated string around and the function which receives an arbitrary input would carry the issue.
> > 
> > Thanks for the example, I wanted to cover ranges, but I have forgotten it, because in the wild peoples do not check for bounds. Copy-pasted.
> Note that function `copy_and_test` in my example would not be necessarily visible to you. It may be in another translation unit.
> 
> So in order to "cover ranges" you'll have to discard all cases where the length is completely unknown.
That unknowness idea is cool, I will leave that comment as not `Done`.


================
Comment at: clang/test/Analysis/cert/str31-c-notes.cpp:22
+  if (gets(buf)) {}
+  // expected-note at -1 {{'gets' could write outside of 'buf'}}
+  // expected-note at -2 {{Assuming the condition is false}}
----------------
NoQ wrote:
> Charusso wrote:
> > NoQ wrote:
> > > I still believe that this should be *the* warning in this code. This is already broken code.
> > No, it is not broken, sadly. I would say it is broken, but this checker tries to be smarter than `grep`. Truncation is fine for [[ https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator | STR31-C ]].
> > 
> > That change in my mindset made this checker possible when I have dropped the idea of warn on function calls.
> > `Truncation is fine for STR31-C.`
> This isn't just truncation, this is buffer overflow, and it's definitely not fine if we want to "Guarantee that storage for strings has sufficient space for character data".
> 
> This is absolutely the right place to simply warn on every invocation of `gets()`.
I have found out we have another recommendation for that type of issues:
https://wiki.sei.cmu.edu/confluence/display/c/STR03-C.+Do+not+inadvertently+truncate+a+string

Here the truncation is a truncation, but in a security manner we cannot rely on feelings, so we warn from now.


================
Comment at: clang/test/Analysis/cert/str31-c-notes.cpp:26-27
+
+  free(buf);
+  // expected-warning at -1 {{'buf' is not null-terminated}}
+  // expected-note at -2 {{'buf' is not null-terminated}}
----------------
NoQ wrote:
> Charusso wrote:
> > NoQ wrote:
> > > It is not wrong to free a buffer that isn't null-terminated. We shouldn't warn here.
> > That is when the string is read. We could have a list of function calls which are non-problematic. For now I am fine with that note, because you do not allocate something without reading it before freeing it. It is just an arbitrary test for now.
> > That is when the string is read.
> 
> That strikes me as an example of [[ https://en.wikipedia.org/wiki/Law_of_the_instrument | the Maslow's hammer ]]. The static analyzer isn't great at guessing whether the function reads from the string or not. For a dynamic analysis it wouldn't have been a problem.
> 
> That said, maybe you could get away with discriminating between `char *` and `const char *`. If the string is passed in by const pointer, there's nothing you can do except read from it. If it's passed in by non-const pointer, you can conservatively assume that it isn't read.
Cool idea, thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70411/new/

https://reviews.llvm.org/D70411





More information about the cfe-commits mailing list