[PATCH] D141389: [DFSAN] Add support for strnlen, strncat, strsep, sscanf and _tolower

Andrew via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Mar 31 10:43:47 PDT 2023


browneee added a comment.

We're getting really close!

Yes, that build error looks unrelated. Someone should fix it soon.



================
Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:221
+  if (flags().strict_data_dependencies) {
+    *ret_label = res ? dfsan_read_label(base, sizeof(base)) : 0;
+  } else {
----------------
`base, sizeof(base)` does not make sense - the size does not correspond to the pointer used.

Either
```
dfsan_read_label(base, sizeof(*base))   // first byte pointed to by base
dfsan_read_label(base, strlen(base))     // whole string pointed to by base
dfsan_read_label(&base, sizeof(base))  // the base pointer
```

In this case I think we want the base pointer.

`dfsan_read_label(&base, sizeof(base))`  // the base pointer
should be equivalent to doing
`dfsan_label base_label = dfsan_read_label(s, sizeof(*s))`  up at the start of the function, just after we declare base then use `base_label` here.

Lets go with the second option to avoid taking the address of a variable.

This is semantically equivalent to my first suggestion: `dfsan_get_label(base) == dfsan_read_label(&base, sizeof(base)) == base_label`.
Sorry I didn't consider the other constraints (no dfsan_get_label in this file because the pass is not run on this code; avoid taking address of variable) and suggest this in the first place.


================
Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:224
+    *ret_label =
+        dfsan_union(dfsan_read_label(base, sizeof(base)),
+                    dfsan_union(dfsan_read_label(delim, strlen(delim) + 1),
----------------
Also use `base_label` here.


================
Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:240
+    if (res)
+      *ret_origin = dfsan_read_origin_of_first_taint(base, strlen(base));
+  } else {
----------------
`dfsan_get_origin(base) == dfsan_read_origin_of_first_taint(&base, sizeof(base))`

As noted above `base, strlen(base)` is a meaningfully valid pointer, length - but it is not the level of indirection we want here.

I think we want the same solution as above.

`dfsan_origin base_origin = dfsan_read_origin_of_first_taint(s, sizeof(*s))` at the start of the function, just after declaring and assigning base.


================
Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:243
+    if (*ret_label) {
+      dfsan_origin o = dfsan_read_origin_of_first_taint(base, strlen(base));
+      if (o) {
----------------
Also use `base_origin` here.


================
Comment at: compiler-rt/test/dfsan/custom.cpp:1641
+  // taint delim pointer
+  dfsan_set_label(j_label, &p_delim, sizeof(&p_delim));
+  // taint the string data bytes
----------------
remove the &

It still works because `sizeof(pointer_var) == sizeof(&pointer_var)`, but it doesn't logically match up like it should.

(Sorry, this one is my fault - I made this mistake giving an example in an earlier comment.)


================
Comment at: compiler-rt/test/dfsan/custom.cpp:1645
+  // taint the string pointer
+  dfsan_set_label(m_label, &p_s, sizeof(&p_s));
+
----------------
remove & in sizeof


================
Comment at: compiler-rt/test/dfsan/custom.cpp:1650
+#ifdef STRICT_DATA_DEPENDENCIES
+  ASSERT_LABEL(rv, k_label);
+  ASSERT_READ_LABEL(rv, strlen(rv), k_label);
----------------
The value `rv` has a data flow from the the string pointer, not the string bytes.
It should have `m_label` not `k_label`.

This is related to line 221 (using `base` instead of `&base`) in the other file.


================
Comment at: compiler-rt/test/dfsan/custom.cpp:1654
+  ASSERT_LABEL(rv, dfsan_union(dfsan_union(i_label, j_label), k_label));
+  ASSERT_INIT_ORIGIN_EQ_ORIGIN(&rv, *s);
+#endif
----------------
rv is a pointer, and I think it's origin should match the pointer s, not the bytes in the string.

I think this should be:

`ASSERT_INIT_ORIGIN_EQ_ORIGIN(&rv, s);`

This is related to line 240 (`*ret_origin = dfsan_read_origin_of_first_taint(base, strlen(base));`) in the other file being the wrong level of indirection.


(Side note: the existing ASSERT_INIT_ORIGIN_EQ_ORIGIN macro feels a bit odd in that the arguments are at different levels of indirection - but not something to fix as part of this change)


================
Comment at: compiler-rt/test/dfsan/custom.cpp:1660
+  char **pp_s_base = pp_s;
+  dfsan_set_label(n_label, &pp_s, sizeof(&pp_s));
+
----------------
remove & in sizeof


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141389/new/

https://reviews.llvm.org/D141389



More information about the cfe-commits mailing list