[PATCH] D141389: [DFSAN] Add support for strnlen, strncat, strsep, sscanf and _tolower

Thu Feb 2 06:08:41 PST 2023

tkuchta added inline comments.

================
Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:213
+  char *res = strsep(s, delim);
+  s_label = dfsan_read_label(base, strlen(base));
+  if (res && (res != base)) {
----------------
browneee wrote:
> tkuchta wrote:
> > browneee wrote:
> > > tkuchta wrote:
> > > > browneee wrote:
> > > > > The `s_label` represents the taint label for `s` (the pointer).
> > > > > 
> > > > > This line would clobber the taint label of the pointer (`s`) with a taint label from `s[0][0..n]`.
> > > > > 
> > > > > I think this line should be deleted.
> > > > Agree, s_label represents the taint associated with the **s pointer. However I am now wondering if that is the taint wich we would like to return.
> > > > For example, if we have
> > > > if (flags().strict_data_dependencies) {
> > > >     *ret_label = res ? s_label : 0;
> > > > 
> > > > We would taint the return value with the value of the pointer, not the data. It means that if we operate on a string for which the characters are tainted, but the pointer itself isn't, we are likely going to return label 0. My understanding was that we are more concerned with the taint of the data, not the pointer, am I missing something?
> > > > 
> > > Yes, we are usually more concerned with the taint of the data, not the pointer.
> > > 
> > > With strict dependencies:
> > > // If the input pointer is tainted, the output pointer would be tainted (because it is derived from the input pointer - maybe the same value).
> > > taint(s[0]) == dfsan_read_label(s, sizeof(s)) ====> taint(ret) == ret_label[0]
> > > 
> > > // If the input data is tainted, the output data would be tainted (because it is derived from the input data).
> > > taint(s[0][0]) == MEM_TO_SHADOW(s[0])[0] ====> taint(ret[0]) == MEM_TO_SHADOW(ret)[0]
> > > 
> > > Because s[0] == ret  (or ret==null), (for the non-null case) the output shadow bytes are the same bytes as input shadow bytes and so these taint labels for the string data in shadow memory do not need to be explicitly propagated in this function. 
> > > 
> > > I think the only case actually changing/copying string data is writing a delimiter byte to NULL, which you handled.
> > I am working on the changes and I came across a strange behavior that I would like to ask about.
> > 
> > It turned out that if we do
> > 
> > char *s = " ... ";
> > dfsan_set_label(label, &p_s, sizeof(&p_s));
> > 
> > Then, the s_label within the handler is 0, not "label". This is unexpected, as we would like the pointer itself to be labeled here.
> > I believe this has something to do with the fact that the input string in strsep is a double pointer. For example this works as expected for the delimiter string, which is a single pointer. 
> > It's either I'm doing something incorrectly or there is some issue with propagating labels for double pointers.
> > Have you perhaps come across this behavior before?
> I'm not sure what p_s is in your example. Could you provide a more complete example?
> (and maybe all in one function, not half in __dfsw_strsep and half in another function)
> 
> Here is an example showing taint labels at different levels of indirection:
> 
> ```
> #include <assert.h>
> #include <string.h>
> #include <sanitizer/dfsan_interface.h>
> 
> int main(void) {
>   char *s = " ... ";
>   char **p_s = &s;
>   char ***pp_s = &p_s;
> 
>   dfsan_label i_label = 1 << 0;
>   dfsan_label j_label = 1 << 1;
>   dfsan_label k_label = 1 << 2;
>   dfsan_label m_label = 1 << 3;
> 
>   // string data
>   dfsan_set_label(i_label, s, strlen(s));
>   // pointer s
>   dfsan_set_label(j_label, &s, sizeof(s));
>   // pointer to pointer s
>   dfsan_set_label(k_label, &p_s, sizeof(p_s));
>   // pointer to pointer to pointer s
>   dfsan_set_label(m_label, &pp_s, sizeof(pp_s));
> 
>   assert(pp_s[0][0] == s);
> 
>   // string data
>   assert(dfsan_read_label(s, strlen(s)) == i_label);
>   // pointer s
>   assert(dfsan_read_label(&s, sizeof(s)) == j_label);
>   // pointer to pointer s
>   assert(dfsan_read_label(&p_s, sizeof(p_s)) == k_label);
>   // pointer to pointer to pointer s
>   assert(dfsan_read_label(&pp_s, sizeof(pp_s)) == m_label);
> 
>   return 0;
> }
> ```
Hello,

Thank you for the comment.

I should have provided a more complete example.
What I meant is a behavior I've found while working on the tests.
In the test file I have something like that:

```
char *s = "Hello world/";
char *delim = " /";
dfsan_set_label(label, &s, sizeof(&s));
char *rv = strep(&s, delim);
 ```

If I get this right, the line 
```
dfsan_set_label(label, &s, sizeof(&s));

``` 
should have applied a taint to the `s` pointer.
However, when inside `strsep`, if we check the `s_label`, it's `0`, not `label`

```
SANITIZER_INTERFACE_ATTRIBUTE char *__dfsw_strsep(char **s, const char *delim,
                                                  dfsan_label s_label,
                                                  dfsan_label delim_label,
                                                  dfsan_label *ret_label) {
    fprintf(stderr, "s_label = %d\n", s_label); // -> this prints out "0"
```

If we do exactly the same trick with the `delim` pointer, the associated `delim_label` will be equal to `label`, as expected. 

It seems to me that either I am missing some important point here or the `s_label` is not propagated correctly by the compiler runtime for a double pointer to the user-provided handler (in my case `__dfsw_strsep`).

I believe that strsep is currently the first function within dfsan_custom.cpp in which we operate on or check the label of the double pointer.

I would like to ask if you maybe came across this behavior. It might be that I misunderstood how this should work and would be grateful for your help.
Thank you

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141389/new/

https://reviews.llvm.org/D141389