[PATCH] D110927: [analyzer] Access stored value of a constant array through a pointer to another type

Balázs Benics via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Nov 12 07:34:52 PST 2021


steakhal added inline comments.


================
Comment at: clang/test/Analysis/initialization.cpp:295-299
+void glob_cast_opposite_sign1() {
+  auto *ptr = (unsigned int *)glob_arr2;
+  auto x1 = ptr[0]; // no-warning
+  auto x2 = ptr[1]; // expected-warning{{garbage or undefined}}
+}
----------------
ASDenysPetrov wrote:
> steakhal wrote:
> > ASDenysPetrov wrote:
> > > steakhal wrote:
> > > > ASDenysPetrov wrote:
> > > > > steakhal wrote:
> > > > > > I think it's not correct.
> > > > > > 
> > > > > > `glob_arr2` refers to an object of dynamic type `int[2]`.
> > > > > > And the pointer decayed from it is `int *`, which has //similar type// to `unsigned *` what you are using to access the memory.
> > > > > > Since they are //similar//, this operation should work for all the valid indices, thus `ptr[0]`, `ptr[1]`, `ptr[2]`, `ptr[3]` should all be valid.
> > > > > > 
> > > > > > 
> > > > > > glob_arr2 refers to an object of dynamic type int[2].
> > > > > `glob_arr2` has an extent of 4.
> > > > > > And the pointer decayed from it is int *, which has similar type to unsigned * what you are using to access the memory.
> > > > > Yes, they are the same and it perfectly suits to http://eel.is/c++draft/basic.lval#11 . But still you can't access other element then the first one according to http://eel.is/c++draft/basic.compound#3 : //For purposes of pointer arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), [...] an object of type T that is not an array element is considered to belong to an array with one element of type T. //
> > > > > `unsigned int` and `int` are different types according to http://eel.is/c++draft/basic.fundamental#2 . The object of type `unsigned int` is NOT an array, beacuse there is no array of type `unsigned int`. Hence you can only only access the first and a single element of type `unsigned int`.
> > > > > 
> > > > Yes, `glob_arr` has 4 elements, sorry for this typo.
> > > > 
> > > > ---
> > > > I disagree with the second part though. It seems like gcc disagrees as well: https://godbolt.org/z/5o7ozvPar
> > > > ```lang=C++
> > > > auto foo(unsigned (&p)[4], int (&q)[4]) {
> > > >     p[0] = 2;
> > > >     q[0] = 1;
> > > >     return p[0]; // memory read! thus, p and q can alias according to g++.
> > > > }
> > > > ```
> > > > `gcc` still thinks that `p` and `q` can refer to the same memory region even if the `-fstrict-aliasing` flag is present.
> > > > In other circumstances, it would produce a `mov eax, 2` instead of a memory load if `p` and `q` cannot alias.
> > > >I disagree with the second part though. It seems like gcc disagrees as well: https://godbolt.org/z/5o7ozvPar
> > > I see how all the compilers handle this. I've switched several vendors on Godbolt. They all have the similar behavior. You are right from compiler perspective, but it's not quite the same in terms of C++ abstract machine.
> > > Your example is correct, it works OK and is consistent with the Standard:
> > > ```
> > >     p[0] = 2;
> > >     q[0] = 1;
> > > ```
> > > This one also works as expected but goes against the Standard:
> > > ```
> > >     p[1] = 2;
> > >     q[1] = 1;
> > > ```
> > > I want you watch this particular part about access via aliased pointers: https://youtu.be/_qzMpk-22cc?t=2623 For me it seems correct, at least I can't argue against of it now.
> > But in this example we have an array, thus pointer arithmetic should be fine according to the video.
> > Could you find the mentioned email discussion? I would be really curious.
> > BTW from the analyzer user's perspective, it would be 'better' to find strict aliasing violations which actually matter - now, and risking miscompilations.
> I'm not enough confident about that to debate now. As you can see we started arguing as well. That means that the Standard really leaves the subject for misinterpretation.
> OK, let's bring it to the form in which compilers operate. I mean, let's legitimate indirections with a different offsets through aliased types: `auto x = ((unsigened)arr)[42]; // OK`.
> I think it will be enough for this patch for now.
So, we agree that `glob_cast_opposite_sign1()` should have no warnings. At least for now. I would be okay with this direction.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110927/new/

https://reviews.llvm.org/D110927



More information about the cfe-commits mailing list