[PATCH] D141389: [DFSAN] Add support for strnlen, strncat, strsep, sscanf and _tolower

Andrew via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Feb 7 08:02:45 PST 2023


browneee added inline comments.


================
Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:221
+  if (flags().strict_data_dependencies) {
+    *ret_label = res ? s_label : 0;
+  } else {
----------------
When `res != NULL`, then `res` is derived from `*s`, not from `s`.

e.g.

```
*ret_label = res ? dfsan_get_label(base) : 0;
```


================
Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:224
+    size_t s_bytes_read = (res ? strlen(res) : strlen(base)) + 1;
+    *ret_label =
+        dfsan_union(dfsan_read_label(base, s_bytes_read),
----------------
I think `dfsan_get_label(base)` should also be a part of this output.


================
Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:241
+    if (res)
+      *ret_origin = s_origin;
+  } else {
----------------
As above, when res != NULL, then res is derived from *s, not from s.

```
if (res)
    *ret_origin = dfsan_get_origin(base);
```


================
Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:243
+  } else {
+    if (*ret_label) {
+      size_t s_bytes_read = (res ? strlen(res) : strlen(base)) + 1;
----------------
As above, I think `dfsan_get_origin(base)` should be the first priority, as it is the origin with the most direct value flow.

Note that `dfsan_get_origin(base)` gives the origin id for `base` whereas `dfsan_read_origin_of_first_taint(base, ...)` gives origin id for `*base`.


================
Comment at: compiler-rt/test/dfsan/custom.cpp:1659
+
+  rv = strsep(pp_s, p_delim);
+
----------------
I think this test can call strsep fewer times. It just needs to set all the different labels first.

Different inputs to taint (each one should be a distinct taint label):
* `*p_delim`   (with `dfsan_set_label(..., p_delim, strlen(p_delim));`)
* `p_delim`   (with `dfsan_set_label(..., &p_delim, sizeof(&p_delim));`)
* `*s`   (with `dfsan_set_label(..., s, strlen(s));`)
* `s` aka `p_s`   (with `dfsan_set_label(..., &s, sizeof(&s));` or `dfsan_set_label(..., &p_s, sizeof(&p_s));`)
* `pp_s` (with `dfsan_set_label(..., &pp_s, sizeof(&pp_s));`)

This is less than 8, so we can do it all at once (unless you want to test different taint labels on different bytes of the input strings).


Repository:
  rCRT Compiler Runtime

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141389/new/

https://reviews.llvm.org/D141389



More information about the cfe-commits mailing list