[PATCH] D120236: [analyzer] Add more sources to Taint analysis

Endre Fülöp via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Feb 23 01:57:06 PST 2022


gamesh411 added inline comments.


================
Comment at: clang/docs/analyzer/checkers.rst:2358
 Default sources defined by ``GenericTaintChecker``:
-``fdopen``, ``fopen``, ``freopen``, ``getch``, ``getchar``, ``getchar_unlocked``, ``gets``, ``scanf``, ``socket``, ``wgetch``
+ ``_IO_getc``, ``fdopen``, ``fopen``, ``freopen``, ``get_current_dir_name``, ``getch``, ``getchar``, ``getchar_unlocked``, ``getcw``, ``getcwd``, ``getgroups``, ``gethostname``, ``getlogin``, ``getlogin_r``, ``getnameinfo``, ``getopt``, ``getopt_long``, ``getopt_only``, ``gets``, ``getseuserbyname``, ``readlink``, ``scanf``, ``scanf_s``, ``socket``, ``wgetch``
 
----------------
steakhal wrote:
> typo/dup?
> I cannot recognize the `getcw()` call. Could you please refer to the specification or an instance where it was defined?
`getwd` is the right one instead of `getcw`


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:546-548
       {{"gets"}, TR::Source({{0}, ReturnValueIndex})},
       {{"scanf"}, TR::Source({{}, 1})},
+      {{"scanf_s"}, TR::Source({{}, {1}})},
----------------
steakhal wrote:
> If we handle `gets`, `scanf`, we should also model the `*_s` versions as well.
> ```lang=C
> char *gets_s(char *s, rsize_t n);
> int scanf_s(const char *restrict format, ...);
> int fscanf_s(FILE *restrict stream, const char *restrict format, ...);
> int sscanf_s(const char *restrict s, const char *restrict format, ...);
> int vscanf_s(const char *restrict format, va_list arg);
> int vfscanf_s(FILE *restrict stream, const char *restrict format, va_list arg);
> int vsscanf_s(const char *restrict s, const char *restrict format, va_list arg);
> ```
I have added gets_s, and will add the _s variants for the others in the other patch that deals with the propagatorsj.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:550-552
+      {{"getopt"}, TR::Source({{ReturnValueIndex}})},
+      {{"getopt_long"}, TR::Source({{ReturnValueIndex}})},
+      {{"getopt_long_only"}, TR::Source({{ReturnValueIndex}})},
----------------
steakhal wrote:
> IMO these functions are highly domain-specific.
> On errors, they return specific/well-defined values e.g. `-1`, `'?'` or `':'`.
> That being said, the analyzer does not have this knowledge, thus it will model these as `conjured` symbols.
> If these values were modeled as tainted, we would likely introduce the number of false-positives regarding our limited capabilities of modeling the function accurately.
> 
> tldr; I'm against these three rules; or alternatively prove that my concerns are not issues on real code bases.
I  agree that the handling of these should be in another checker, I remember some false positives ( mainly uninteresting, typical "just won't fix" errors ) relating to `getopt`, and a domain-specific checker could be more appropriate here.
Removed them.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:556
+      {{"getwd"}, TR::Source({{0, ReturnValueIndex}})},
+      {{"readlink"}, TR::Source({{1, ReturnValueIndex}})},
+      {{"get_current_dir_name"}, TR::Source({{ReturnValueIndex}})},
----------------
steakhal wrote:
> We should check `readlinkat` as well.
Added


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:559
+      {{"gethostname"}, TR::Source({{0}})},
+      {{"getnameinfo"}, TR::Source({{2, 4}})},
+      {{"getseuserbyname"}, TR::Source({{1, 2}})},
----------------
steakhal wrote:
> In what cases can this function introduce taint?
The getnameinfo converts from
```
struct sockaddr_in {
    sa_family_t    sin_family; /* address family: AF_INET */
    in_port_t      sin_port;   /* port in network byte order */
    struct in_addr sin_addr;   /* internet address */
};

/* Internet address */
struct in_addr {
    uint32_t       s_addr;     /* address in network byte order */
};
```
to hostname and servername strings.
One could argue that by crafting a specific IP address, that  is known to resolve to a specific hostname in the running environment could lead an attacker injecting a chosen (in some circumstances arbitrary) string into the code at the point of this function.

I know this is a bit contrived, and more on the cybersecurity side of things, so I am not sure whether to add this here, or add this in a specific checker, or just leave altogether. Please share your opinion about this.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:561
+      {{"getseuserbyname"}, TR::Source({{1, 2}})},
+      {{"getgroups"}, TR::Source({{1}})},
+      {{"getlogin"}, TR::Source({{ReturnValueIndex}})},
----------------
steakhal wrote:
> >On success, `getgroups()` returns the number of supplementary group IDs. On error, -1 is returned, and `errno` is set appropriately.
> 
> According to this, the return value index should be also tainted.
Added



================
Comment at: clang/test/Analysis/taint-generic.c:385
+  struct option long_opts[] = {{0, 0, 0, 0}};
+  int opt = getopt_long_only(argc, argv, "a:b:02", long_opts, &option_index);
+  return 1 / opt; // expected-warning {{Division by a tainted value, possibly zero}}
----------------
steakhal wrote:
> Well, this can never return zero.
> What we should do is to do a state-split for the failure case; since the application should definitely handle a failure in this part; thus the split would be justified.
Removed


================
Comment at: clang/test/Analysis/taint-generic.c:392
+int underscore_IO_getc_is_source(_IO_FILE *fp) {
+  char c = _IO_getc(fp);
+  return 1 / c; // expected-warning {{Division by a tainted value, possibly zero}}
----------------
steakhal wrote:
> Sometimes we taint the `fd` and propagate based on that, and othertimes, we simply just return taint.
> However, I think it still looks like a better tradeoff this way.
> I just wanted to highlight this. Maybe a comment on the CallDescription would be beneficial in describing this discrepancy.
Added a comment


================
Comment at: clang/test/Analysis/taint-generic.c:417
+char *get_current_dir_name(void);
+int get_current_dir_name_is_source() {
+  char *d = get_current_dir_name();
----------------
steakhal wrote:
> Please avoid spelling the given function in the name of the test directly in a verbatim manner.
> If we would use the `CDF_MaybeBuiltin` matching mode, the `get_current_dir_name_is_source` would match for the `CallDescription {"get_current_dir_name"}` due to the way we fuzzy match for builtins.
> Prefixing with `Test` like `testGet_current_dir_name` would be fine though, only the underscores are handled differently.
> This applies to the rest of the test cases as well.
fixed the test names


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120236/new/

https://reviews.llvm.org/D120236



More information about the cfe-commits mailing list