[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

Balazs Benics via cfe-commits cfe-commits at lists.llvm.org
Thu Sep 14 06:08:06 PDT 2023


steakhal wrote:

> Putting an upper bound on `strlen` is not just for `malloc`, it's also needed for ArrayBoundV2.
> 
> As a very clear example, this [function `strfuzz_ends_with` from twin](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=twin_v0.8.1_ednikru_taint_nostrlen_baseline&newcheck=twin_v0.8.1_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2648820&report-hash=637b598311521fbe5ec9b657174d2862&report-filepath=%2arcopt.c) functions iterates backwards on two strings and runs into an "Out of bounds memory access (index is tainted)" false positive when it tries to access the last character of a nonempty string.
> There are also other more complicated false positives from [vim](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline&newcheck=vim_v8.2.1920_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2649310&report-hash=79dc8522d2cd34ca8e1b2dc2db64b2df&report-filepath=%2aos_unix.c) and [postgres](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=postgres_REL_13_0_ednikru_taint_nostrlen_baseline&newcheck=postgres_REL_13_0_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2658495&report-hash=bd04e4bd17058bb6c64a601f3dcfc23b&report-filepath=%2afe-protocol3.c).

Yes, I've also seen similar cases around the result of `getenv`. I believe we also discussed this in [D159105](https://reviews.llvm.org/D159105). 
The important thing here is that they are not that frequent and pretty easy to conclude if it's a TP or FP. 

> Based on these I'd say that propagating taint in `strlen` is not acceptable without providing upper bounds. (These FPs are not extraordinarily common, but it's simply _too ugly_ when the analyzer says that accessing the last character of a string may be out of bounds access.)

Let's see how it works out in practice. I won't object to this change.

Do you also plan to partially revert `wcslen` in 61924da630532c91f00351b7e84548eb42e2e1e0 to match the `strlen` behavior?

https://github.com/llvm/llvm-project/pull/66086


More information about the cfe-commits mailing list