[clang] [analyzer] TaintPropagation checker strlen() should not propagate (PR #66086)

via cfe-commits cfe-commits at lists.llvm.org
Thu Sep 14 04:47:29 PDT 2023


DonatNagyE wrote:

> By looking at the disappeared reports you attached, I'm convinced that the `MsgTaintedBufferSize` diagnostics give little to no benefit in general. On the other side, I've seen good hits for OOBV2 in the presence of taint - even if that's rarely the case. On the theory side, I also believe that propagation should happen on `strlen` and similar functions.

You're right that theoretically speaking the `strlen`-like functions propagate taint, but their return value is practically always constrained from above (if the string has a null terminator, the length is less than its extent in memory) and I think that propagating taint without considering these constraint is worse than not propagating taint.

By the way, could you show an OOBV2 true positive that involves taint propagated by `strlen` call? My impression was that the "index is tainted" errors that involve `strlen` are typically false positives where the analyzer cannot deduce that `s[strlen(s)]` is valid and e.g. `s[strlen(s)-1]` is also valid when the string is nonempty.

> Consequently, I agree with the raised problems, but I disagree with the approach. I would rather remove the `MsgTaintedBufferSize` diagnostic to resolve those FPs. Alternatively, we can also think of creating a heuristic to reduce such FPs. For e.g. check if the most significant bit of the allocation size is proven to be unset (aka. checked some meaningful upperbounds) and suppress reports in that case, but report otherwise. Would it be okay with you to proceed by not removing taint propagation?

I agree that `MsgTaintedBufferSize` reports should be limited to cases where the tainted value _can be large_ and I would suggest a threshold that's significantly lower than the SIZE_MAX/2 implied by your "most significant bit" suggestion (ideally the threshold value would be configurable). However, I think this eliminates the false positives related to `strlen` only if we also add some logic that can reliably "tag" the return value of `strlen` with reasonable upper bounds.

I'd be happy to accept a patch that (1) ensures that `strlen` propagates taint (2) reliably puts upper bounds on the result of `strlen` and (3) limits `MsgTaintedBufferSize` to reporting cases where the tainted value can be large. However, as long as we don't have this refined code, I think the best temporary solution is removing taint propagation from `strlen` as the untainted return value is a good practical approximation for a tainted value with strong upper bounds.

https://github.com/llvm/llvm-project/pull/66086


More information about the cfe-commits mailing list