[PATCH] D159105: [analyzer] ArrayBoundCheckerV2 should check the region for taint as well

Balázs Benics via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Sep 1 04:13:37 PDT 2023


steakhal added a comment.

In D159105#4633864 <https://reviews.llvm.org/D159105#4633864>, @donat.nagy wrote:

> As I thought more about this commit I realized that I have two vague but significant concerns. I'm sorry that I wasn't able to describe these before you started to dig into the details of the heuristics.
>
> (1) I don't think that code like `int *ptr; scanf("%ld", &ptr);` should be covered by an ArrayBound checker: this is not just a "be careful with the value of the index" issue, but a raw pointer that's completely controlled by an untrusted source. I'd try to cover this kind of bug with either the StdLibrary checker or a small new checker [or perhaps just a clang-tidy check] which would report pointers coming from `scanf` or other user input even if it doesn't see a dereference of the pointer value.
>
> (2) Taintedness of a memory region (i.e. `isTainted(state, location)` where `location` is presumably a `MemRegionVal`) is poorly defined and might mean three different things:
> (a) **The 'begin' pointer value of the region is tainted** (controlled by an untrusted source). This may be a "we're already dead" situation (e.g. the pointer is an arbitrary value from `scanf`) or something completely harmless (we are examining something like the second subscript operator in `large_array[user_input].field[idx]` after verifying that the first subscript with the tainted index is valid), but in either case, the reliability of the 'begin' pointer should not affect the comparison between the region extent and the index value.
> (b)** The extent of the region is tainted** (controlled by an untrusted source, e.g. it's an user input string with an unknown length). In this case it's reasonable to turn on the "paranoid" comparison mode -- however, this situation should be recorded by marking the //extent symbol// of the region as tainted instead of marking the region itself as tainted. I'd be happy to see a commit that ensures that the extent symbol of user input strings is tainted and its taintedness is handled in ArrayBoundsV2.
> (c)** The contents of the region are tainted** (controlled by an untrusted source). Arguably this should be recorded by marking the //contents// as tainted, but that's difficult to implement on a string/array, so it's a somewhat reasonable shortcut to put the taint on the region itself. This kind of taint is completely irrelevant for out of bounds memory access and should not affect the comparisons.
>
> In summary, I'd say that this checker should not be affected by the taintedness of the memory region itself (but it should continue to monitor tainted indices, and if possible, it should be extended to handle tainted extent values). I'm not completely opposed to merging this patch if you add enough heuristics and workarounds to make it stable in practice, but I feel that this is not the right way forward.

I see your point, makes sense and I'd agree.
Probably, by another checker/check, we could indeed sidestep the problem by handling these situations separately. I'll think about it.

To me, the whole boils down to null-terminated buffers, such as one returned by `getenv`.
There, we have an implicit dependency between the extent of the region and the position of the null-terminator inside; as such it affects what locations are allowed to be read.
Locations are usually checked by the array-bounds checker (V1 xor V2). Consequently, it's still not clear to me if this falls in/out of this checker's responsibility.
We don't have many APIs like `getenv`, but users could define custom rules that would cause similar problems, thus ATM only `getenv` suffers from these new FPs.

The fact that `isTainted()` could mean so many things, as you enumerated has 3 different meanings, which is a problem on its own. The design decision was to make it deliberately vague to let users express their custom APIs more easily in their yaml files.

The measurement is still running, and I'll post a few words about that once I had a look; but anyway, I feel that this patch is controversial and needs more discussion before it could land.
I plan to abandon this and think about it. We can continue the discussion of course.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159105/new/

https://reviews.llvm.org/D159105



More information about the cfe-commits mailing list