[PATCH] D158707: [analyzer] Fix a few size-type signedness inconsistency related to DynamicExtent

Mon Aug 28 07:53:43 PDT 2023

steakhal added a comment.

In D158707#4621270 <https://reviews.llvm.org/D158707#4621270>, @donat.nagy wrote:

> In D158707#4621135 <https://reviews.llvm.org/D158707#4621135>, @steakhal wrote:
>
>> Oh, so we would canonicalize towards a signed extent type. I see. I think I'm okay with that.
>> However, I think this needs a little bit of polishing and testing when the region does not point to the beginning of an array or object, but rather inside an object, or like this:
>>
>>   int nums[] = {1,2,3};
>>   char *p = (char*)&nums[1];
>>   
>>   clang_analyzer_dumpExtent(p);
>>   clang_analyzer_dumpElementCount(p);
>>   ++p;
>>   clang_analyzer_dumpExtent(p);
>>   clang_analyzer_dumpElementCount(p);
>>   ++p;
>>   int *q = (int*)p;
>>   clang_analyzer_dumpExtent(q);
>>   clang_analyzer_dumpElementCount(q);
>
> Unfortunately the analyzer engine does not distinguish pointer arithmetic and element access, and we cannot handle these situations while that limitation exists. For example, if we have an array `int nums[3];`, then the element `nums[1]` and the incremented pointer `int *ptr = nums + 1;` are represented by the same `ElementRegion` object; so we cannot simultaneously say that (1) `nums[1]` is an int-sized memory region, and (2) it's valid to access the elements of the array as `ptr[-1]`, `ptr[0]` and `ptr[1]`.
>
> The least bad heuristic is probably `RegionRawOffsetV2::computeOffset()` from ArrayBoundCheckerV2, which strips away all the `ElementRegion` layers, acting as if they are all coming from pointer arithmetic. This behavior is incorrect if we want to query the extent/elementcount of a "real" ElementRegion (e.g. a slice of a multidimensional array or `(char*)&nums[1]` in your example), but mostly leads to false negatives. On the other hand, if we process a pointer increment as if it were an element access, we can easily run into false positive reports -- this is why I had to abandon D150446 <https://reviews.llvm.org/D150446>.
>
> I'd suggest that we should ignore pointer arithmetic in this commit (or create a testcase that documents the current insufficient behavior) and remember this as yet another reason to do through rewrite of the memory model. This is related to the plan https://github.com/llvm/llvm-project/issues/42709 suggested by @NoQ, but perhaps it would be enough to do a less drastic change in that direction.

I'm aware of this, and I didn't mean to ask to fix this here.
I just want to see how these ExprInspection APIs will be affected in these edge-cases, where previously it returned "unknown", and now will return something else.
It's useful to demonstrate this if we ever come back to this commit.
Pinning down some tests and having some FIXMEs there should bring enough visibility.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158707/new/

https://reviews.llvm.org/D158707