[clang] [analyzer] Moving TaintPropagation checker out of alpha (PR #67352)

via cfe-commits cfe-commits at lists.llvm.org
Tue Sep 26 07:39:37 PDT 2023


DonatNagyE wrote:

I agree that it's a blocking issue that some numerical arguments act as taint sinks unconditionally (even if the analyzer knows about constraints that show that the actual value is valid). I'd suggest that those concrete issues should be addressed by separate improvement commits, and we should reconsider this "move out of alpha" commit afterwards.

As I briefly scrolled through the list of sink functions, it seems that there are two affected function families: the `malloc`-like functions and `memcpy`-like functions.

IMO in the `malloc` function family the ideal behavior would be:
1. Report a _tainted buffer size can be huge_ warning when we cannot deduce that the allocated memory area is less than a certain threshold (e.g. 1 MiB, the user should be able to configure this). I'd say that it's a  bug (denial of service opportunity) if untrusted data can trigger huge memory allocations (gigabytes, terabytes, `SIZE_MAX` etc.).
2. Mark the extent symbol of the allocated memory area as tainted and ensure that this triggers the the taint-aware (a/k/a paranoid) mode of e.g. ArrayBoundsV2 and similar checkers.

I think these checks should be implemented in MallocChecker, where we already have lots of logic related to the size of allocated memory. 

The `memcpy` function family is a simpler case, there we should just eliminate the current "generic taint sink" behavior (= if the number of copied bits is tainted, always report a bug) and replace it with taint-aware optimism/pessimism in CStringChecker (where AFAIK we already have logic that checks whether the memcopied chunks fits into the target memory region). This logic would let us accept
```c++
char buf[100], srcBuf[100] = {0};
size_t size = tainted();
if (size > 100)
  return;
memcpy(buf, srcBuf, size);
```
while creating a bug report for the variant where the early return condition is `(size > 101)`.

@haoNoQ I don't really understand your remark that
> The report may be technically correct but I think the entire idea of the checker never made sense in the first place. It doesn't matter that untrusted data is used to specify buffer size once; it matters that data with _different levels of trust_, or _coming from different sources_, is used to specify size of the _same buffer on different occasions_.

What does "size of the same buffer on different occasions" mean here? I'd guess that it refers to something like
```c++
void *get_mem(bool use_default_size) {
  size_t size = 1024;
  if (!use_default_size)
    size = get_tainted_number_from_config_file();
  // do something important that we want to do before each allocation...
  return malloc(size);
}
```
but I'd argue that this is actually valid (although a bit unusual) code.

Personally I feel that the _core idea of this checker_ is that it marks the return value of some functions as tainted to let security-conscious users may wish to enable pessimistic handling of that data. This is a direct and unavoidable "replace FNs with FPs tradeoff" and I think the users should have this option even if it produces a significant amout of FPs.

However you're completely right that even in "pessimistic handling" the analyzer should at least consult the current state before reporting that "this number is tainted, the `memcpy` / `memset` / etc. may cause a buffer overrun".

(Also note that this the simplistic "tainted data + sink = bug report" logic is appropriate for functions like `system` where the potentially tainted data is a string; because we cannot store meaningful constraints about strings.)

https://github.com/llvm/llvm-project/pull/67352


More information about the cfe-commits mailing list