[PATCH] D159105: [analyzer] ArrayBoundCheckerV2 should check the region for taint as well

DonĂ¡t Nagy via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Sep 1 03:49:59 PDT 2023


donat.nagy added a comment.

As I thought more about this commit I realized that I have two vague but significant concerns. I'm sorry that I wasn't able to describe these before you started to dig into the details of the heuristics.

(1) I don't think that code like `int *ptr; scanf("%ld", &ptr);` should be covered by an ArrayBound checker: this is not just a "be careful with the value of the index" issue, but a raw pointer that's completely controlled by an untrusted source. I'd try to cover this kind of bug with either the StdLibrary checker or a small new checker [or perhaps just a clang-tidy check] which would report pointers coming from `scanf` or other user input even if it doesn't see a dereference of the pointer value.

(2) Taintedness of a memory region (i.e. `isTainted(state, location)` where `location` is presumably a `MemRegionVal`) is poorly defined and might mean three different things:
(a) **The 'begin' pointer value of the region is tainted** (controlled by an untrusted source). This may be a "we're already dead" situation (e.g. the pointer is an arbitrary value from `scanf`) or something completely harmless (we are examining something like the second subscript operator in `large_array[user_input].field[idx]` after verifying that the first subscript with the tainted index is valid), but in either case, the reliability of the 'begin' pointer should not affect the comparison between the region extent and the index value.
(b)** The extent of the region is tainted** (controlled by an untrusted source, e.g. it's an user input string with an unknown length). In this case it's reasonable to turn on the "paranoid" comparison mode -- however, this situation should be recorded by marking the //extent symbol// of the region as tainted instead of marking the region itself as tainted. I'd be happy to see a commit that ensures that the extent symbol of user input strings is tainted and its taintedness is handled in ArrayBoundsV2.
(c)** The contents of the region are tainted** (controlled by an untrusted source). Arguably this should be recorded by marking the //contents// as tainted, but that's difficult to implement on a string/array, so it's a somewhat reasonable shortcut to put the taint on the region itself. This kind of taint is completely irrelevant for out of bounds memory access and should not affect the comparisons.

In summary, I'd say that this checker should not be affected by the taintedness of the memory region itself (but it should continue to monitor tainted indices, and if possible, it should be extended to handle tainted extent values). I'm not completely opposed to merging this patch if you add enough heuristics and workarounds to make it stable in practice, but I feel that this is not the right way forward.

It's fortunate that we'll meet in person on Monday because there we can discuss this topic (and other plans) in more detail.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159105/new/

https://reviews.llvm.org/D159105



More information about the cfe-commits mailing list