[PATCH] D120369: [analyzer] Add more propagations to Taint analysis

Balázs Benics via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Feb 23 08:04:40 PST 2022


steakhal added a comment.

Huh, this was a long one. 😅 🚀



================
Comment at: clang/docs/analyzer/checkers.rst:2361
 Default propagations defined by ``GenericTaintChecker``:
-``atoi``, ``atol``, ``atoll``, ``fgetc``, ``fgetln``, ``fgets``, ``fscanf``, ``sscanf``, ``getc``, ``getc_unlocked``, ``getdelim``, ``getline``, ``getw``, ``pread``, ``read``, ``strchr``, ``strrchr``, ``tolower``, ``toupper``
+``atoi``, ``atol``, ``atoll``, ``basename``, ``dirname``, ``fgetc``, ``fgetln``, ``fgets``, ``fnmatch``, ``fread``, ``fscanf``, ``fscanf_s``, ``index``, ``inflate``, ``isalnum``, ``isalpha``, ``isascii``, ``isblank``, ``iscntrl``, ``isdigit``, ``isgraph``, ``islower``, ``isprint``, ``ispunct``, ``isspace``, ``isupper``, ``isxdigit``, ``memchr``, ``memrchr``, ``sscanf``, ``getc``, ``getc_unlocked``, ``getdelim``, ``getline``, ``getw``, ``memcmp``, ``memcpy``, ``memmem``, ``memmove``, ``mbtowc``, ``pread``, ``qsort``, ``qsort_r``, ``rawmemchr``, ``read``, ``readv``, ``recv``, ``recvfrom``, ``rindex``, ``strcasestr``, ``strchr``, ``strchrnul``,  ``strcasecmp``, ``strcmp``, ``strcspn``, ``strlen``, ``strncasecmp``, ``strncmp``, ``strndup``, ``strndupa``, ``strpbrk``, ``strrchr``, ``strsep``, ``strspn``, ``strstr``, ``strtol``, ``strtoll``, ``strtoul``, ``strtoull``, ``tolower``, ``toupper``, ``ttyname``, ``ttyname_r``, ``vfscanf``, ``vfscanf``, ``wctomb``, ``wcwidth``
 
----------------
I cannot see the corresponding propagation rule.
That being said, it would be handy to mention that this is for `zlib` decompression and this should be probably a taint source anyway.

`vfscanf` occurs two times.

`vscanf` is not mentioned here; and there are probably a couple others like this.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:561
+      {{"vscanf"}, TR::Prop({{0}}, {{}, 1})},
+      {{"vfscanf"}, TR::Prop({{0}}, {{}, 2})},
+
----------------
This function has exactly 3 arguments.
I'm also puzzled how tainting `va_list` will work out. That should be modeled separately IMO.
This comment applies to all of the similar `va_list` accepting functions, such as `vscanf`, `vfscanf`, and possibly others.

That being said, I think `vscanf` is more like `scanf`, so it should be modeled as a taint source instead of a propagator.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:575
+      {{"fread"}, TR::Prop({{3}}, {{0, ReturnValueIndex}})},
+      {{"readv"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
+      {{"recv"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
----------------
I'm on board with marking read operations as props/sources.
Let's look at the declaration: `ssize_t readv(int fd, const struct iovec *iov, int iovcnt);`
I'm not sure how could the pointee of `iov` be modified by this call, as its `const`.
Additionally, I doubt the effectiveness of the rule, since I don't think it would be too likely to have a path leading to a taint sink with an `iov` pointer. That being said, let it be, but don't expect much from this rule :D


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:577
+      {{"recv"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
+      {{"recvfrom"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
+
----------------
It's sad that we don't model `recvmsg`, but it would be quite difficult to model. I can see why you left that out.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:579-580
+
+      {{"ttyname"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{"ttyname_r"}, TR::Prop({{0}}, {{1, ReturnValueIndex}})},
+
----------------
I'm not sure how could these be used as gadgets, but let it be.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:582-583
+
+      {{"dirname"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{"basename"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{"fnmatch"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
----------------
These should be sorted, isn't it?


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:584
+      {{"basename"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{"fnmatch"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{"memchr"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
----------------
> `int fnmatch(const char *pattern, const char *string, int flags);`
>The `fnmatch()` function checks whether the string argument matches the pattern argument, which is a shell wildcard pattern (see `glob(7)`).
>From this //man// entry I would think that we should propagate from the **second** argument.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:596
+      {{"memmove"}, TR::Prop({{1}}, {{0, ReturnValueIndex}})},
+      {{"memmem"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+
----------------
One could say that if `memmem` was called with a tainted //needle// and actually finds something in the //haystack//, that would mean essentially that the value pointed by the return value has the same content as the //needle//.
However, I still agree with the current propagation rule, depending only on `arg 0`.
This might reasoning might deserve a comment though.
`strstr` also shares this property.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:606
+
+      // FIXME: in case of arrays ,only the first element of the array gets
+      // tainted.
----------------



================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:615-616
+      {{"strncasecmp"}, TR::Prop({{0, 1, 2}}, {{ReturnValueIndex}})},
+      {{"strspn"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{"strcspn"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
+      {{"strpbrk"}, TR::Prop({{0}}, {{ReturnValueIndex}})},
----------------
IMO these should propagate from `{0,1}`, similarly how the `strncmp` does propagate from its last argument.


================
Comment at: clang/test/Analysis/taint-generic.c:372-378
+int testVscanf(int *d) {
+  char format[10];
+  scanf("%9s", format); // fake a tainted a file descriptor
+
+  vscanf(format, &d);
+  return 1 / *d; // expected-warning {{Division by a tainted value, possibly zero}}
+}
----------------
This test case definitely needs to be reworked.

1) We don't have FDs in this context, unlike the comment suggests.
2) `vscanf` should be an unconditional taint source, instead of being a propagator.


================
Comment at: clang/test/Analysis/taint-generic.c:392-395
+  if (some_global_flag_to_branch_on) // just to have 2 branches, and assert 2 division by zero messages
+    return 1 / *buffer;              // expected-warning {{Division by a tainted value, possibly zero}}
+
+  return 1 / read; // expected-warning {{Division by a tainted value, possibly zero}}
----------------
For this type of things I can recommend using the `clang_analyzer_isTainted(T)` `debug.ExprInspection` introspection function.
It will do the right thing, without sinking the execution path.

I've seen in some other test cases that you used an extra top-level fn parameter for the same.
That's slightly better than using a global, but I would still recommend the debug function.


================
Comment at: clang/test/Analysis/taint-generic.c:399-400
+struct iovec {
+  void *iov_base; /* Starting address */
+  size_t iov_len; /* Number of bytes to transfer */
+};
----------------
Please use single-line comments.
It makes debugging test files easier in some cases.


================
Comment at: clang/test/Analysis/taint-generic.c:408
+  size_t read = readv(fd, iov, iovcnt);
+  // FIXME: should be able to assert that iov is also tainted
+  return 1 / read; // expected-warning {{Division by a tainted value, possibly zero}}
----------------
`clang_analyzer_isTainted(*iov)`


================
Comment at: clang/test/Analysis/taint-generic.c:905
+int isspace(int c);
+int testIsspace() {
+  char c;
----------------
You should group these all into a single test case.
This way the setup code for the test dominates compared to the actual content.


================
Comment at: clang/test/Analysis/taint-generic.c:927-928
+int cmp_less(const void *lhs, const void *rhs) {
+  return *(int *)lhs < *(int *)rhs ? -1 : *(int *)lhs > *(int *)rhs ? 1
+                                                                    : 0;
+}
----------------
>The comparison function must return an integer less than, equal to, or greater than zero if the first argument is considered to be respectively less than, equal to, or greater than the second.


================
Comment at: clang/test/Analysis/taint-generic.c:939-942
+int cmp_less_than(const void *lhs, const void *rhs, void *baseline) {
+  return *(int *)lhs < *(int *)baseline ? -1 : *(int *)lhs > *(int *)baseline ? 1
+                                                                              : 0;
+}
----------------
Just use the `cmp_less` instead.
You can also fuse the qsort test cases into a single function.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120369/new/

https://reviews.llvm.org/D120369



More information about the cfe-commits mailing list