[PATCH] D52219: [analyzer] (1/n) Support pointee mutation analysis in ExprMutationAnalyzer.

Thu Sep 20 09:30:54 PDT 2018

shuaiwang added inline comments.

================
Comment at: unittests/Analysis/ExprMutationAnalyzerTest.cpp:67
+    if (const auto *DRE = dyn_cast<DeclRefExpr>(E)) {
+      if (DRE->getNameInfo().getAsString()[0] == 'p')
+        Finder = PointeeMutationFinder;
----------------
JonasToth wrote:
> shuaiwang wrote:
> > JonasToth wrote:
> > > That doesn't look like a super idea. It is super hidden that variable 'p*' will be analyzed for pointee mutation and others aren't. Especially in 12 months and when someone else wants to change something, this will be the source of headaches.
> > > 
> > > Don't you think there could be a better way of doing this switch?
> > Yeah... agree :)
> > But how about let's keep it this way just for now, and I'll pause the work on supporting pointee analysis and fix it properly but in a separate diff following this one?
> > 
> > I've been thinking about this and also did some prototyping, and here are my thoughts:
> > In order to properly fix this we need to change the interface of `ExprMutationAnalyzer` to not simply return a `Stmt` but instead return a `MutationTrace` with additional metadata. This wasn't needed before because we can reconstruct a mutation chain by continuously calling `findMutation`, and that works well for cases like "int x; int& r0 = x; int& r1 = r0; r1 = 123;". But now that pointers come into play, and we need to handle cases like "int *p; int *p2 = p; int& r = *p2; r = 123;", and in order to reconstruct the mutation chain, we need to do `findPointeeMutation(p)` -> `findPointeeMutation(p2)` -> `findMutation(r)`, notice that we need to switch from calling `findPointeeMutation` to calling `findMutation`. However we don't know where the switch should happen simply based on what's returned from `findPointeeMutation`, we need additional metadata for that.
> > 
> > That interface change will unavoidably affect how memoization is done so we need to change memoization as well.
> > 
> > Now since we need to change both the interface and memoization anyway, we might as well design them properly so that multiple-layers of pointers can be supported nicely. I think we can end up with something like this:
> > ```
> > class ExprMutationAnalyzer {
> > public:
> >   struct MutationTrace {
> >     const Stmt *Value;
> >     const MutationTrace *Next;
> >     // Other metadata
> >   };
> > 
> >   // Finds whether any mutative operations are performed on the given expression after dereferencing `DerefLayers` times.
> >   const MutationTrace *findMutation(const Expr *, int DerefLayers = 0);
> >   const MutationTrace *findMutation(const Decl *, int DerefLayers = 0);
> > };
> > ```
> > This involves quite some refactoring, and doing this refactoring before this diff and later rebasing seems quite some trouble that I'd like to avoid.
> > 
> > Let me know what do you think :)
> Is the second part of your answer part to adressing this testing issue?
> Given the whole Analyzer is WIP it is ok to postpone the testing change to later for me. But I feel that this is a quality issue that more experience clang develops should comment on because it still lands in trunk. Are you aware of other patches then clang-tidy that rely on EMA?
> 
> Regarding you second part (its related to multi-layer pointer mutation?!):
> Having a dependency-graph that follows the general data flow _with_ mutation annotations would be nice to have. There is probably even something in clang (e.g. `CFG` is probably similar in idea, just with all execution paths), but I don't know enough to properly judge here.
> 
> In my opinion it is ok, to not do the refactoring now and just support one layer of pointer indirection. Especially in C++ there is usually no need for multi-layer pointers but more a relict from C-style.
> C on the other hand relies on that.
> Designing EMA new for the more generalized functionality (if necessary) would be of course great, but again: Better analyzer ppl should be involved then me, even though I would still be very interested to help :)
The second part is more of trying to explain why I'd prefer to keep the test this way just for this diff.
Basically we need a refactoring that makes EMA returns MutationTrace in order to fix this test util here.

But thinking more about this, maybe we don't need that (for now), this util is to help restore a mutation chain, and since we don't support a chain of mutation for pointers yet we can have a simpler version of `pointeeMutatedBy` that doesn't support chaining.

I mentioned multi-layer pointer mutation because _if_ we were to do a refactoring, we'd better just do it right and take possible features needed in the future into account. But the refactoring itself is motivated by the need of fixing this testing util (and making the return value from `findMutation` generally more useful.)

BTW by `MutationTrace` it's not as complicated or sophisticated as `CFG`, it's simply a trace helping understand why the mutation happens, for example in this case:
```
int x;
int &y = x;
int &z = y;
z = 123;
```
it's a bit disconnected if we just say `z = 123` mutated `declRefExpr(to(x))`, instead:
- `findMutation(x)` returns `y`
- `findMutation(y)` returns `z`
- `findMutation(z)` returns `z = 123`
This doesn't apply to (majority) of cases where the mutative expression is an ancestor of the given expression, in those cases it's typically quite clear why the ancestor expression mutated its descendant.

Anyway, I'll remove this little hackery here and address other comments later (hopefully today), and I'll do a refactoring before pointer analysis supports chaining.

Repository:
  rC Clang

https://reviews.llvm.org/D52219