[PATCH] D104285: [analyzer][AST] Retrieve value by direct index from list initialization of constant array declaration.

Aaron Ballman via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Aug 17 07:38:53 PDT 2021


aaron.ballman added a comment.

In D104285#2949273 <https://reviews.llvm.org/D104285#2949273>, @ASDenysPetrov wrote:

> @aaron.ballman
> Ok, I got your concerns.

Thanks for sticking with me!

> As I can see we shall only reason about objects within the bounds. Otherwise, we shall return `UndefinedVal`.
> E.g.:
>
>   int arr[2][5];
>   int* ptr1= (int*)arr; // Valid indexing for `ptr` is in range [0,4].
>   int* ptr2 = &arr[0][0]; // Same as above.
>   ptr1[4]; // Valid object.
>   ptr2[5]; // Out of bound. UB. UndefinedVal.
>
> Would it be correct?

I believe so, yes (with a caveat below). I also believe this holds (reversing the pointer bases):

  ptr2[4]; // valid object
  ptr1[5]; // out of bounds

I've been staring at the C standard for a while, and I think the situation is also UB in C. As with C++, the array subscript operators are rewritten to be pointer arithmetic using addition (6.5.2.1p2). Additive operators says (6.5.6p9) in part: `... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise the behavior is undefined. If the result points to one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.` I believe we run afoul of "the same array object" and "one past the last element" clauses because multidimensional arrays are defined to be arrays of arrays (6.7.6.2).

Complicating matters somewhat, I would also say that your use of `[5]` is not technically out of bounds, but is a one-past-the-end that's then dereferenced as part of the subscript rewriting. So it's technically fine to form the pointer to the one-past-the-end element, but it's not okay to dereference it. That matters for things like:

  int arr[2][5] = {0};
  const int* ptr2 = &arr[0][0];
  const int* end = ptr2 + 5;
  
  for (; ptr2 < end; ++ptr2) {
    int whatever = *ptr2;
  }

where `end` is fine because it's never dereferenced. This distinction may matter to the static analyzer because a one-past-the-end pointer is valid for performing arithmetic on, but an out-of-bounds pointer is not.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104285/new/

https://reviews.llvm.org/D104285



More information about the cfe-commits mailing list