[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

Balazs Benics via cfe-commits cfe-commits at lists.llvm.org
Thu Sep 14 05:07:52 PDT 2023


steakhal wrote:

> So for me either solution would work:
> a) remove strlen() as a propagator and note it in the checker doc
> b) remove malloc() as a sink and note it in the checker doc
> c) don't do anything and live with the false positives

TBH I would prefer (b). I see removing the whole `MsgTaintedBufferSize` is a bit too wild. And `alloca` with unconstrained tainted size should be still detected.
(c) is definitely not an option, assuming that there were honest motivations behind this patch.



> By the way, could you show an OOBV2 true positive that involves taint propagated by `strlen` call? My impression was that the "index is tainted" errors that involve `strlen` are typically false positives where the analyzer cannot deduce that `s[strlen(s)]` is valid and e.g. `s[strlen(s)-1]` is also valid when the string is nonempty.

I'm sure I've seen it in the Juliet suite. I believe something similar must have been the motivation for me proposing `wcslen` as a taint propagator because it was missing from the taint-dataflow. Since the test cases are generated, and there are `char` counterparts, I'd bet there would be a missing taint-flow there as well, if we didn't propagate on `strlen`.
I also understand that the Juliet is an artificial benchmark, that might or might not represent real-world scenarios.



> > Consequently, I agree with the raised problems, but I disagree with the approach. I would rather remove the `MsgTaintedBufferSize` diagnostic to resolve those FPs. Alternatively, we can also think of creating a heuristic to reduce such FPs. For e.g. check if the most significant bit of the allocation size is proven to be unset (aka. checked some meaningful upperbounds) and suppress reports in that case, but report otherwise. Would it be okay with you to proceed by not removing taint propagation?
> 
> I agree that `MsgTaintedBufferSize` reports should be limited to cases where the tainted value _can be large_ and I would suggest a threshold that's significantly lower than the SIZE_MAX/2 implied by your "most significant bit" suggestion (ideally the threshold value would be configurable). However, I think this eliminates the false positives related to `strlen` only if we also add some logic that can reliably "tag" the return value of `strlen` with reasonable upper bounds.

Yea, upperbounding `strlen` result I'm not sure if it really makes sense, just to fulfill the heuristic imposed on `malloc`.

Let's discuss then how much benefit warning on tainted malloc allocations provide.
If it turns out the tainted malloc does not really work, we should just remove that from sinks.

https://github.com/llvm/llvm-project/pull/66086


More information about the cfe-commits mailing list