[PATCH] D76287: [analysis][analyzer] Introduce the skeleton of a reaching definitions calculator

Gábor Horváth via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Mar 17 12:57:13 PDT 2020


xazax.hun added reviewers: gribozavr2, ymandel.
xazax.hun added a comment.
Herald added a subscriber: ASDenysPetrov.

Added some more reviewers who might be interested.

I think it is very crucial to make the intentions clear, how do you define `definition` and `variable`?
Other than assignments we might include pre/postfix increment/decrement, non-const by-ref function arguments and so on as `definitions`.
Also, it would be great to have some proposal upfront how indirect writes should behave.

Basically, this algorithm is not very useful on its own, but it can be used as a building block for other kind of analyses. So it would make sense to try to look for potential users and use cases and see if they have the same requirements or do you need to introduce configurations?
Trying to rewrite some existing algorithms in terms of these sets and see if the tests pass might be a good experiment if it is not too much work. (In case we can validate the performance we could even replace the original implementations if this makes things easier.)

I think having a reaching definition set per basic block makes perfect sense, as it should be relatively easy to calculate the reaching definition of an instruction given the set and the basic block. So maybe it is more efficient to calculate the reaching definition sets on demand for instruction rather than computing it for every single instruction.

Regarding ASTMatchers, I think they might be great for now for prototyping but I think once you want to make this more robust covering every corner case in the matcher expression will be at least as hard as writing a visitor if not harder. But we will see :)



================
Comment at: clang/include/clang/Analysis/Analyses/ReachingDefinitions.h:40
+
+struct Variable {
+  const VarDecl *Var;
----------------
I wonder if `Variable` will be the right notion long term. Do we want to be able to represent heap locations or do we exclude them on purpose? Reasoning about the heap is quite challenging so both answers might be reasonable. But in case we try to tackle the more general problem, we basically need a more general term like `MemoryLocation`.


================
Comment at: clang/include/clang/Analysis/Analyses/ReachingDefinitions.h:60
+
+class Definition : public Variable {
+public:
----------------
Is the inheritance justified here? Is a definition a variable? Maybe having a variable member better express the relationship.


================
Comment at: clang/include/clang/Analysis/Analyses/ReachingDefinitions.h:98
+/// may kill several definitions of the same variables from different locations.
+using DefinitionSet = std::set<Definition, VarAndCFGElementLess>;
+
----------------
As far as I understand this is the analysis state for you, while GenSet is used for the transfer functions (I might be mistaken). I think it might be better to origanize the code the following way:

```
/// Analysis state
everything state related

/// Transfer functions
everything transfer function related

/// Iteration/Traversal
the main loop of the algorithm
```


================
Comment at: clang/include/clang/Analysis/Analyses/ReachingDefinitions.h:222
+  static const void *getTag() {
+    static int x;
+    return &x;
----------------
Hmm, I am not really familiar with the specifics and maybe this is optimized away but always wondered if we only need the address of a static variable why don't we chose the smallest one? Like a char?


================
Comment at: clang/lib/Analysis/ReachingDefinitions.cpp:41
+  for (const Definition &Perpetrator : Set)
+    if (Victim.Var == Perpetrator.Var)
+      return true;
----------------
In the future this will be more complicated.

For example if I assign to a struct all of its members needs to be killed. As a result, you will not only need to represent memory regions but also the relationships between them. I wonder if the analyzer's memregion classes are any good or are they too specific to the analyzer.


================
Comment at: clang/lib/Analysis/ReachingDefinitions.cpp:137
+  static internal::Matcher<Stmt> getMatcher() {
+    return declRefExpr(to(varDecl().bind("var")));
+  }
----------------
Note that, `memberExpr` is not supported at this point. It is not derived from `declRefExpr`. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76287/new/

https://reviews.llvm.org/D76287





More information about the cfe-commits mailing list