[PATCH] D148355: [analyzer] Fix comparison logic in ArrayBoundCheckerV2

Fri Apr 21 07:02:51 PDT 2023

donat.nagy added a comment.

I tested this patch on a few open source projects and it significantly reduced the number of alpha.security.ArrayBoundV2 reports:

| Project name             | memcached | tmux | curl | twin | vim | openssl |
| Baseline report #        | 0         | 2    | 13   | 6    | 12  | 7       |
| Report # with this patch | 0         | 2    | 0    | 3    | 3   | 0       |
|

(The patched version did not produce any additional reports, it seems these projects does not contain any bugs where the enhancements of the upper bound check could become significant. As a minor side effect, applying the patch also eliminated an alpha.deadcode.UnreachableCode report -- presumably because the analysis wasn't stopped at some false positive.)

I briefly looked at the eliminated reports, but they are very convoluted and I didn't analyze them to understand their (presumably buggy) logic. It seems that the remaining reports are also false positive (which is not surprising, as we are analyzing released, stable versions), but they are mostly reasonable:

- Several reports complain about expressions like `c = fgetc(f), isdigit(c)` because `fgetc(f)` returns tainted output and (at least in glibc) character type macros like `isdigit(c)` are equivalent to `((*__ctype_b_loc())[(int)(c)] & <some bitmask>`. It would be relatively straightforward to fix these reports, perhaps by modelling the behavior of the internal function `__ctype_b_loc()`.
- Several reports are produced by code like `s = get_tainted_string(); int n = strlen(s); if (n > 0) return s[n-1];` where the analyzer says that `n-1` is a tainted index so this may be an overflow error.  To model this correctly we'd need to maintain a connection between the memory region of the string and the symbol conjured as the return value of strlen; my first guess is that this is doable but not too easy.
- There are also some sporadic false positives where the analyzer is confused by code that relies on tricky ad-hoc invariants.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148355/new/

https://reviews.llvm.org/D148355