[PATCH] D120236: [analyzer] Add more sources to Taint analysis
Endre Fülöp via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Feb 23 01:57:06 PST 2022
gamesh411 added inline comments.
================
Comment at: clang/docs/analyzer/checkers.rst:2358
Default sources defined by ``GenericTaintChecker``:
-``fdopen``, ``fopen``, ``freopen``, ``getch``, ``getchar``, ``getchar_unlocked``, ``gets``, ``scanf``, ``socket``, ``wgetch``
+ ``_IO_getc``, ``fdopen``, ``fopen``, ``freopen``, ``get_current_dir_name``, ``getch``, ``getchar``, ``getchar_unlocked``, ``getcw``, ``getcwd``, ``getgroups``, ``gethostname``, ``getlogin``, ``getlogin_r``, ``getnameinfo``, ``getopt``, ``getopt_long``, ``getopt_only``, ``gets``, ``getseuserbyname``, ``readlink``, ``scanf``, ``scanf_s``, ``socket``, ``wgetch``
----------------
steakhal wrote:
> typo/dup?
> I cannot recognize the `getcw()` call. Could you please refer to the specification or an instance where it was defined?
`getwd` is the right one instead of `getcw`
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:546-548
{{"gets"}, TR::Source({{0}, ReturnValueIndex})},
{{"scanf"}, TR::Source({{}, 1})},
+ {{"scanf_s"}, TR::Source({{}, {1}})},
----------------
steakhal wrote:
> If we handle `gets`, `scanf`, we should also model the `*_s` versions as well.
> ```lang=C
> char *gets_s(char *s, rsize_t n);
> int scanf_s(const char *restrict format, ...);
> int fscanf_s(FILE *restrict stream, const char *restrict format, ...);
> int sscanf_s(const char *restrict s, const char *restrict format, ...);
> int vscanf_s(const char *restrict format, va_list arg);
> int vfscanf_s(FILE *restrict stream, const char *restrict format, va_list arg);
> int vsscanf_s(const char *restrict s, const char *restrict format, va_list arg);
> ```
I have added gets_s, and will add the _s variants for the others in the other patch that deals with the propagatorsj.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:550-552
+ {{"getopt"}, TR::Source({{ReturnValueIndex}})},
+ {{"getopt_long"}, TR::Source({{ReturnValueIndex}})},
+ {{"getopt_long_only"}, TR::Source({{ReturnValueIndex}})},
----------------
steakhal wrote:
> IMO these functions are highly domain-specific.
> On errors, they return specific/well-defined values e.g. `-1`, `'?'` or `':'`.
> That being said, the analyzer does not have this knowledge, thus it will model these as `conjured` symbols.
> If these values were modeled as tainted, we would likely introduce the number of false-positives regarding our limited capabilities of modeling the function accurately.
>
> tldr; I'm against these three rules; or alternatively prove that my concerns are not issues on real code bases.
I agree that the handling of these should be in another checker, I remember some false positives ( mainly uninteresting, typical "just won't fix" errors ) relating to `getopt`, and a domain-specific checker could be more appropriate here.
Removed them.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:556
+ {{"getwd"}, TR::Source({{0, ReturnValueIndex}})},
+ {{"readlink"}, TR::Source({{1, ReturnValueIndex}})},
+ {{"get_current_dir_name"}, TR::Source({{ReturnValueIndex}})},
----------------
steakhal wrote:
> We should check `readlinkat` as well.
Added
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:559
+ {{"gethostname"}, TR::Source({{0}})},
+ {{"getnameinfo"}, TR::Source({{2, 4}})},
+ {{"getseuserbyname"}, TR::Source({{1, 2}})},
----------------
steakhal wrote:
> In what cases can this function introduce taint?
The getnameinfo converts from
```
struct sockaddr_in {
sa_family_t sin_family; /* address family: AF_INET */
in_port_t sin_port; /* port in network byte order */
struct in_addr sin_addr; /* internet address */
};
/* Internet address */
struct in_addr {
uint32_t s_addr; /* address in network byte order */
};
```
to hostname and servername strings.
One could argue that by crafting a specific IP address, that is known to resolve to a specific hostname in the running environment could lead an attacker injecting a chosen (in some circumstances arbitrary) string into the code at the point of this function.
I know this is a bit contrived, and more on the cybersecurity side of things, so I am not sure whether to add this here, or add this in a specific checker, or just leave altogether. Please share your opinion about this.
================
Comment at: clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp:561
+ {{"getseuserbyname"}, TR::Source({{1, 2}})},
+ {{"getgroups"}, TR::Source({{1}})},
+ {{"getlogin"}, TR::Source({{ReturnValueIndex}})},
----------------
steakhal wrote:
> >On success, `getgroups()` returns the number of supplementary group IDs. On error, -1 is returned, and `errno` is set appropriately.
>
> According to this, the return value index should be also tainted.
Added
================
Comment at: clang/test/Analysis/taint-generic.c:385
+ struct option long_opts[] = {{0, 0, 0, 0}};
+ int opt = getopt_long_only(argc, argv, "a:b:02", long_opts, &option_index);
+ return 1 / opt; // expected-warning {{Division by a tainted value, possibly zero}}
----------------
steakhal wrote:
> Well, this can never return zero.
> What we should do is to do a state-split for the failure case; since the application should definitely handle a failure in this part; thus the split would be justified.
Removed
================
Comment at: clang/test/Analysis/taint-generic.c:392
+int underscore_IO_getc_is_source(_IO_FILE *fp) {
+ char c = _IO_getc(fp);
+ return 1 / c; // expected-warning {{Division by a tainted value, possibly zero}}
----------------
steakhal wrote:
> Sometimes we taint the `fd` and propagate based on that, and othertimes, we simply just return taint.
> However, I think it still looks like a better tradeoff this way.
> I just wanted to highlight this. Maybe a comment on the CallDescription would be beneficial in describing this discrepancy.
Added a comment
================
Comment at: clang/test/Analysis/taint-generic.c:417
+char *get_current_dir_name(void);
+int get_current_dir_name_is_source() {
+ char *d = get_current_dir_name();
----------------
steakhal wrote:
> Please avoid spelling the given function in the name of the test directly in a verbatim manner.
> If we would use the `CDF_MaybeBuiltin` matching mode, the `get_current_dir_name_is_source` would match for the `CallDescription {"get_current_dir_name"}` due to the way we fuzzy match for builtins.
> Prefixing with `Test` like `testGet_current_dir_name` would be fine though, only the underscores are handled differently.
> This applies to the rest of the test cases as well.
fixed the test names
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D120236/new/
https://reviews.llvm.org/D120236
More information about the cfe-commits
mailing list