[PATCH] D96976: [analyzer] Fix reinterpret_cast handling for pointer-to-member

Balázs Benics via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Mar 2 06:47:43 PST 2021


steakhal added inline comments.


================
Comment at: clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp:29-31
+  Some some;
+  some.*sf = 14;
+  clang_analyzer_eval(some.*sf == 14); // expected-warning{{UNKNOWN}}
----------------
RedDocMD wrote:
> steakhal wrote:
> > The assignment is actually UB.
> > TBH I don't know how to test such behavior xD
> > Same for the next example.
> > 
> > Shouldn't it return `undef` for reading via an invalid member pointer?
> I don't quite know what is `undef`. Is it a special macro or something?
> Using an invalid member pointer is definitely UB, but I also need to show that casting to an invalid pointer is properly handled because that is not UB. I guess I will remove the member access and claim that since there was no crash, it is okay and has been handled appropriately.
By `undef`, I was referring to the `clang::ento::UndefinedVal` - which usually represents values that shouldn't be used, not even read. For example, an uninitialized variable has undefined value. Or an out-of-bound memory access also produces undefined value. There are a few checkers that are hunting for such undefined values like CallAndMessageChecker, UndefBranchChecker, UndefinedArraySubscriptChecker, UndefinedAssignmentChecker, UndefResultChecker.

Probably removing the member accesses is the best you can do.

> but I also need to show that casting to an invalid pointer is properly handled because that is not UB
In this test, I can't see where you cast the member pointer back to make it valid again.


================
Comment at: clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp:43-50
+struct A {};
+struct B : public A {};
+struct C {
+  int field;
+};
+struct D : public C {};
+struct E : public B, public D {};
----------------
RedDocMD wrote:
> steakhal wrote:
> > An ASCII art would help so much:
> > ```
> > A   C(field)
> > |   |
> > B   D
> >  \ /
> >   E
> >   |
> >   F
> > ```
> > However, I'm still missing a diamond-shaped inheritance.
> > An ASCII art would help so much:
> > ```
> > A   C(field)
> > |   |
> > B   D
> >  \ /
> >   E
> >   |
> >   F
> > ```
> > However, I'm still missing a diamond-shaped inheritance.
> 
> Thanks for the ASCII art!
> The diamond-shaped inheritance as far as I understood will cause illegal code.
> Eg:
> ```
>        A
>        |
>        B
>      /   \
>     C     D
>      \   /
>        E
> ```
> According to my understanding, if I have a field in A or B, and try to define a member pointer like `int E::* ef = &A::field`, it is not allowed.
> On GCC, this is the error message I get:
> ```
> struct A {
>   int field;
> };
> 
> struct B : public A {};
> struct C : public virtual B {};
> struct D : public virtual B {};
> struct E : public C, public D {};
> 
> int main() {
>   int E::* ef1 = &A::field;
> }
> ```
> diamond-member-pointer.cpp: In function ‘int main()’:
> diamond-member-pointer.cpp:11:22: error: pointer to member conversion via virtual base ‘B’
>    11 |   int E::* ef1 = &A::field;
>       |                      ^~~~~
> ```
Oh yea, now I get it.
When it inherits virtually, you can not take the address of it, and if you don't inherit virtually then it's ambiguous. Either way, it won't compile.
My bad. However, my other comments still apply.


================
Comment at: clang/test/Analysis/reinterpret-cast-pointer-to-member.cpp:52-62
+void testMultiple() {
+  int F::*f = &F::field;
+  int A::*a = reinterpret_cast<int A::*>(f);
+  int C::*c = reinterpret_cast<int C::*>(f);
+  A aobj;
+  C cobj;
+  aobj.*a = 13;
----------------
RedDocMD wrote:
> steakhal wrote:
> > Wait a minute. It's not how it works.
> > How I imagine member pointers, they are just offsets.
> > `&F::field` is notionally equivalent with `offsetof(F, field)`. That being said, You can not apply this member pointer to any object besides `F`.
> > Imagine if the classes of the inheritance tree would have other fields as well.
> > Then the `offsetof(T, field)` would be different for `F`, and `C`.
> > 
> > This example demonstrates that both of these member pointer dereferences are UB.
> > https://godbolt.org/z/15sMEP
> > It returns different values depending on the optimization level, which is a clear sign of UB.
> > BTW this issue is closely related to strict aliasing.
> The member access on A is definitely UB, I guess I will do what I proposed in the `Some` case.
> I don't think the other one is.  Consider the following:
> ```
> struct A {};
> struct B : public A {};
> struct C {
>   int field;
> };
> struct D : public C {};
> struct E : public B, public D {};
> struct F : public E {};
> 
> int main() {
>   int F::* ff = &F::field;
>   int C::* cf1 = static_cast<int C::*>(ff);
>   int C::* cf2 = reinterpret_cast<int C::*>(ff);
>   C c;
>   c.*cf1 = 10;
>   c.*cf2 = 10;
>   return 0;
> }
> ```
> `cf1` and `cf2` are the same thing, except that they are declared differently (one via `static_cast`, other via `reinterpret_cast`). If we look at the AST (truncated to the first three lines of main):
> ```
> CompoundStmt 0x1a4fe18 <col:12, line:18:1>
>     |-DeclStmt 0x1a4f3a8 <line:11:3, col:26>
>     | `-VarDecl 0x1a21078 <col:3, col:21> col:12 used ff 'int F::*' cinit
>     |   `-ImplicitCastExpr 0x1a4f378 <col:17, col:21> 'int F::*' <BaseToDerivedMemberPointer (E -> D -> C)>
>     |     `-UnaryOperator 0x1a4f360 <col:17, col:21> 'int C::*' prefix '&' cannot overflow
>     |       `-DeclRefExpr 0x1a4f2f8 <col:18, col:21> 'int' lvalue Field 0x1a207d0 'field' 'int'
>     |-DeclStmt 0x1a4f560 <line:12:3, col:43>
>     | `-VarDecl 0x1a4f428 <col:3, col:42> col:12 used cf1 'int C::*' cinit
>     |   `-CXXStaticCastExpr 0x1a4f518 <col:18, col:42> 'int C::*' static_cast<int struct C::*> <DerivedToBaseMemberPointer (E -> D -> C)>
>     |     `-ImplicitCastExpr 0x1a4f500 <col:40> 'int F::*' <LValueToRValue> part_of_explicit_cast
>     |       `-DeclRefExpr 0x1a4f498 <col:40> 'int F::*' lvalue Var 0x1a21078 'ff' 'int F::*'
>     |-DeclStmt 0x1a4f6e8 <line:13:3, col:48>
>     | `-VarDecl 0x1a4f5c8 <col:3, col:47> col:12 used cf2 'int C::*' cinit
>     |   `-CXXReinterpretCastExpr 0x1a4f6b8 <col:18, col:47> 'int C::*' reinterpret_cast<int struct C::*> <ReinterpretMemberPointer>
>     |     `-ImplicitCastExpr 0x1a4f6a0 <col:45> 'int F::*' <LValueToRValue> part_of_explicit_cast
>     |       `-DeclRefExpr 0x1a4f638 <col:45> 'int F::*' lvalue Var 0x1a21078 'ff' 'int F::*'
> ```
> Notice how the `static_cast` figures out the path to the correct subobject. This is how member-pointers are handled as far as I can tell.
> Unfortunately, Stroustrup is surprisingly scant when it comes to this topic. I am trying to dig through the Standard.
I tried to highlight that the classes could have non-static data members which could cause your assumption about `cf1 == cf2` not to hold anymore. See my previously attached godbolt link.

`static_cast<T>(p)` makes sure that the //offset// that the member pointer `p` represents is updated accordingly to match the //offset// of the `field` member within the new type `T`. Usually, it involves some addition/subtraction to accommodate this.
While `static_cast<T>(p)` does not update the value, just reinterprets it in a different way - which usually results in an invalid pointer though and a bunch of subtle rules kick in such as the type similarity, pointer interchangeability, which in general known as //strict-aliasing//.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96976/new/

https://reviews.llvm.org/D96976



More information about the cfe-commits mailing list