[PATCH] D141389: [DFSAN] Add support for strnlen, strncat, strsep, sscanf and _tolower

Andrew via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Mar 31 16:35:43 PDT 2023


browneee added inline comments.


================
Comment at: compiler-rt/lib/dfsan/dfsan_custom.cpp:221
+  if (flags().strict_data_dependencies) {
+    *ret_label = res ? dfsan_read_label(base, sizeof(base)) : 0;
+  } else {
----------------
tkuchta wrote:
> browneee wrote:
> > `base, sizeof(base)` does not make sense - the size does not correspond to the pointer used.
> > 
> > Either
> > ```
> > dfsan_read_label(base, sizeof(*base))   // first byte pointed to by base
> > dfsan_read_label(base, strlen(base))     // whole string pointed to by base
> > dfsan_read_label(&base, sizeof(base))  // the base pointer
> > ```
> > 
> > In this case I think we want the base pointer.
> > 
> > `dfsan_read_label(&base, sizeof(base))`  // the base pointer
> > should be equivalent to doing
> > `dfsan_label base_label = dfsan_read_label(s, sizeof(*s))`  up at the start of the function, just after we declare base then use `base_label` here.
> > 
> > Lets go with the second option to avoid taking the address of a variable.
> > 
> > This is semantically equivalent to my first suggestion: `dfsan_get_label(base) == dfsan_read_label(&base, sizeof(base)) == base_label`.
> > Sorry I didn't consider the other constraints (no dfsan_get_label in this file because the pass is not run on this code; avoid taking address of variable) and suggest this in the first place.
> Thank you for your comments! Just to clarify:
> If we have strict data dependencies we would like *ret_label to be related to the taints of the contents of the string, is that correct? Should we use the base_label there too? My understanding is that base_label represents a taint applied to the pointer, not to the data? 
> 
> In other words, would that be correct to:
> 1) use taints of the string data in the first case (strict dependencies) -> therefore no base_label there as it represents the taint of the pointer
> 2) use the taints of the string data + the taints of the pointer in the second case -> therefore using base_label there
> If we have strict data dependencies we would like *ret_label to be related to the taints of the contents of the string, is that correct?

No, `*ret_label` holds the taint label for the return value, which is a pointer. The return value pointer is derived from `*s`, aka `base`. This is the pointer to the string contents, not the string contents themselves.

> Should we use the base_label there too?

I think we should only use `base_label` here.

> My understanding is that base_label represents a taint applied to the pointer, not to the data?

Correct.


> In other words, would that be correct to:
> 1. use taints of the string data in the first case (strict dependencies) -> therefore no base_label there as it represents the taint of the pointer

No, other way around. We want the base_label, but not the string data.

> 2. use the taints of the string data + the taints of the pointer in the second case -> therefore using base_label there

Yes, we could have both base_label and the taints of the string data in the else case.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141389/new/

https://reviews.llvm.org/D141389



More information about the cfe-commits mailing list