[PATCH] D158707: [analyzer] Fix a few size-type signedness inconsistency related to DynamicExtent

Mon Aug 28 07:21:22 PDT 2023

donat.nagy added a comment.

In D158707#4621135 <https://reviews.llvm.org/D158707#4621135>, @steakhal wrote:

> Oh, so we would canonicalize towards a signed extent type. I see. I think I'm okay with that.
> However, I think this needs a little bit of polishing and testing when the region does not point to the beginning of an array or object, but rather inside an object, or like this:
>
>   int nums[] = {1,2,3};
>   char *p = (char*)&nums[1];
>   
>   clang_analyzer_dumpExtent(p);
>   clang_analyzer_dumpElementCount(p);
>   ++p;
>   clang_analyzer_dumpExtent(p);
>   clang_analyzer_dumpElementCount(p);
>   ++p;
>   int *q = (int*)p;
>   clang_analyzer_dumpExtent(q);
>   clang_analyzer_dumpElementCount(q);

Unfortunately the analyzer engine does not distinguish pointer arithmetic and element access, and we cannot handle these situations while that limitation exists. For example, if we have an array `int nums[3];`, then the element `nums[1]` and the incremented pointer `int *ptr = nums + 1;` are represented by the same `ElementRegion` object; so we cannot simultaneously say that (1) `nums[1]` is an int-sized memory region, and (2) it's valid to access the elements of the array as `ptr[-1]`, `ptr[0]` and `ptr[1]`.

The least bad heuristic is probably `RegionRawOffsetV2::computeOffset()` from ArrayBoundCheckerV2, which strips away all the `ElementRegion` layers, acting as if they are all coming from pointer arithmetic. This behavior is incorrect if we want to query the extent/elementcount of a "real" ElementRegion (e.g. a slice of a multidimensional array or `(char*)&nums[1]` in your example), but mostly leads to false negatives. On the other hand, if we process a pointer increment as if it were an element access, we can easily run into false positive reports -- this is why I had to abandon D150446 <https://reviews.llvm.org/D150446>.

I'd suggest that we should ignore pointer arithmetic in this commit (or create a testcase that documents the current insufficient behavior) and increment the priority of a through rewrite of the memory model. This is related to the plan https://github.com/llvm/llvm-project/issues/42709 suggested by @NoQ, but perhaps it would be enough to do a less drastic change in that direction.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158707/new/

https://reviews.llvm.org/D158707