[PATCH] D117496: [clang][dataflow] Add transfer function for addrof

Gábor Horváth via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jan 18 04:02:48 PST 2022


xazax.hun added inline comments.


================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:190
       Env.setValue(Loc, Env.takeOwnership(std::make_unique<ReferenceValue>(
                             SubExprVal->getPointeeLoc())));
+      break;
----------------
sgatev wrote:
> xazax.hun wrote:
> > I know this is not strictly related to this patch, but why do we actually need do `getPointeeLoc` here? I'd expect `UO_Deref` to be a noop for all intents and purposes.
> > 
> > E.g. for a snippet like: 
> > ```
> > int f(int *p) {
> >     return *p;
> > }
> > ```
> > 
> > The AST looks like:
> > ```
> >   `-CompoundStmt 0x5565d151d038 <col:15, line:4:1>
> >     `-ReturnStmt 0x5565d151d028 <line:3:5, col:13>
> >       `-ImplicitCastExpr 0x5565d151d010 <col:12, col:13> 'int' <LValueToRValue>
> >         `-UnaryOperator 0x5565d151cff8 <col:12, col:13> 'int' lvalue prefix '*' cannot overflow
> >           `-ImplicitCastExpr 0x5565d151cfe0 <col:13> 'int *' <LValueToRValue>
> >             `-DeclRefExpr 0x5565d151cfc0 <col:13> 'int *' lvalue ParmVar 0x5565d151ce00 'p' 'int *'
> > ```
> > 
> > I'd expect any actual dereference to happen in the `LValueToRValue` cast.
> > The reason is, this is how references can be handled. E.g.:
> > ```
> > void f(int &p) {
> >     int &q = p;
> >     int r = p;
> > }
> > ```
> > 
> > Has the AST:
> > ```
> >     |-DeclStmt 0x55d49a1f00b0 <line:3:5, col:15>
> >     | `-VarDecl 0x55d49a1effd0 <col:5, col:14> col:10 q 'int &' cinit
> >     |   `-DeclRefExpr 0x55d49a1f0038 <col:14> 'int' lvalue ParmVar 0x55d49a1efe00 'p' 'int &'
> >     `-DeclStmt 0x55d49a1f0180 <line:4:5, col:14>
> >       `-VarDecl 0x55d49a1f00e0 <col:5, col:13> col:9 r 'int' cinit
> >         `-ImplicitCastExpr 0x55d49a1f0168 <col:13> 'int' <LValueToRValue>
> >           `-DeclRefExpr 0x55d49a1f0148 <col:13> 'int' lvalue ParmVar 0x55d49a1efe00 'p' 'int &'
> > ```
> > 
> > The reason why you know when should you propagate the reference or the pointee of the reference from the subexpression is because of the `LValueToRValue` cast. So you already need to do the "dereference" operator there to correctly handle references. And if you already do that, I see no reason to duplicate this logic here for pointers. Do I miss something?
> In the first example you shared the type of the `LValueToRValue` expression is `int *` and the type of the  `UO_Deref` expression is `int` so I think we need to do something to translate one into the other. My understanding is that `LValueToRValue` performs indirection through references whereas `UO_Deref` performs indirection through pointers. This is why I think we ultimately need both. I might be wrong so please correct me and I'd be happy to address this in a follow up.
Indeed, the type did change at `UO_Deref`. 
I think the problem with handling both as performing the indirection is that we would end up performing two indirections for the first code snippet as both the cast and the dereference operator are present and we would need to introduce special code to avoid that. 

When we implemented lifetime analysis, initially we also handled  `UO_Deref` as performing the dereference operator. It did not quite work, but after replacing `UO_Deref` with noop, and relying on Clang to always put an `LValueToRValue` cast whenever we need to do a dereference operation worked out pretty well and we have not seen a code snippet yet that would not work with that approach.

I agree that the types are confusing in the AST. I am not even convinced about their correctness. For us, it just worked out OK as the type of the pointee always had the right type, so we did not even have to look at the AST.

Basically, in the current model, there is a decision one needs to make every time we look a location up. Namely, whether we should pass `SkipPast::Reference`. I wonder if we actually need to make that decision in the code at all. I think it might be possible that the AST has `LValueToRValue` casts every time we need to look past references, and we could just follow what the AST suggests without forcing us to make a decision point at every API call. That would make it way less error prone to work with this codebase and probably also making it more resilient to future changes in the language or the AST.

Don't get me wrong, I think what you have currently is perfectly OK and the tests demonstrate that this approach works, but I wonder if it is possible to simplify it. And the main reason I think there should be a way to make this simpler, because I implemented something similar in the past and I did not need the equivalent of `SkipPast::Reference`. Here is most of that code: https://github.com/mgehre/llvm-project/blob/lifetime/clang/lib/Analysis/LifetimePsetBuilder.cpp#L324




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117496/new/

https://reviews.llvm.org/D117496



More information about the cfe-commits mailing list