[PATCH] D64991: [analyzer][WIP] Implement a primitive reaching definitions analysis

Kristóf Umann via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Sep 23 11:43:18 PDT 2019


Szelethus updated this revision to Diff 221355.
Szelethus added a comment.

The following words are echoing in my ears as I'm knowingly going completely against them:

In D45532#1083670 <https://reviews.llvm.org/D45532#1083670>, @NoQ wrote:

> I'm asking for this because it's, like, actually difficult and painful to understand formal information in large chunks. One-solution-at-a-time would have been so much easier. It is really a bad idea to start by presenting a working solution; the great idea is to present a small non-working solution with a reasonable idea behind it, and then extend it "in place" as much as it seems necessary. It is very comfortable when parts of the patch with different "status" (main ideas, corner-case fixes, heuristics, refactoring) stay in separate patches.
>
> Essentially every part of the solution that you implement, if you think it might deserve a separate discussion, also deserves a separate patch. It is reasonable to split the work into logical chunks before you start it. It's very unlikely that you have a single idea that takes 500 non-trivial lines of code to express. Splitting up ideas so that they were easy to grasp is an invaluable effort.


I'll keep this revision as more of a bag of ideas, and create a series of patches when I got the core of this figured out. This algorithm is very hard to implement, and its easier for me to not bother with splitting up my entire work when I'm still shooting at a moving target. Several of my ideas required me to make huge refactorings, and as the code grows, my overall desire for the finished algorithm changes as well.

With that said.

I made the following changes:

- Added support for record, but not much in terms of C++ classes.
  - Fields of a record are identified with a (variable, chain of fields) pair, e.g. `Var.a.x.b` would be represented with (`Var`,  `FieldDecl`s of `[a, x, b]`).
  - A `Definition` of a record object doesn't contain `Definition`s of its fields -- instead, whenever a `Definition` of a record object is added to a GEN set, a helper function creates adds `Definition`s for its fields. This is convenient, else we'd need to flatten `Definition`s later on. For the example mentioned earlier, we really need an entry separately for `Var`, `Var.a`, `Var.a.x`, `Var.a.x.b` in order for the analyzer to conveniently ask for reaching definitions later down the line.
- Added/corrected plenty of test cases.
- Haven't really touched variable invalidation.
- Decorated the code with a ton of `TODO`s, but not much documentation just yet.

There are a couple things I'm worrying about, especially those that @xazax.hun mentioned in the inlines. I'm still confident that actual writes to a variable may only be assignments and variable definitions, but they can be esoteric enough, e.g. `(a, b) = 5`, `retrieveWhateverObject() = {9, 5, "remember the vasa"}`. For now, performance doesn't really concern me, but eventually I'll need to move the matchers to a builder class. Invalidation propagation (as discussed on mailing list <http://lists.llvm.org/pipermail/cfe-dev/2019-July/062975.html>) hide in them a variety of tricky edge cases. All of these combined create a desire for a thought-out, easily extensible way to create GEN sets.

So, here are my ideas.

- Overall code structure
  - Create a builder object, stored in the calculator itself, which would store the expensive-to-create matchers.
  - Make assignment an event to which a variety of rules may be assigned to, so special cases like  `(a, b) = 5`, `retrieveWhateverObject() = {9, 5, "remember the vasa"}`, assignment to `this`, and whatever pitfalls Objective-.* languages have may be more easily handled.
  - GEN sets store the outgoing, or in other words, last definition to each variable, but this information is not enough. The way I described <http://lists.llvm.org/pipermail/cfe-dev/2019-July/062885.html> how I plan to make this algorithm semi-intraprocedural with the use of visitors in a nutshell works by the analyzer trying to prove that an invalidation really does write a variable, or can be discarded. However, in the same block, for the same variable, multiple invalidations of the same variable may occur. So, instead of storing the last definition, we should store all invalidations up to the first write. Later, the analyzer could remove these definitions directly, and ask the calculator to regenerate the KILL sets and recalculate the reaching definitions sets.
- Invalidation
  - Research type based alias analysis <http://lists.llvm.org/pipermail/cfe-dev/2019-July/062984.html>, until then, ignore aliases, and just invalidate objects of the same type.
  - The example below highlights how the function call on line 2 is a definition of `i`, because `ptr` might point to it, but may not be a definition of `j`. Buuuuuuuuuuut, with the use of `goto`, it is possible <https://stackoverflow.com/a/6791537>. This implies that every variable in a non-nested scope has to be invalidated if a pointer with an aliasing type escapes. This may be super tricky to implement precisely, so I'll probably just take all variables, not only non-nested ones into account.

  1 void f(int i, int *ptr) {
  2   foo(ptr);
  3   int j;
  4 }

- Invalidation has to be very customizable. The static analyzer doesn't generate code, so some relaxation of the invalidation ruleset maybe desirable, such as assuming that `offsetof` isn't used to change the rest of the record if a field of it escapes.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64991/new/

https://reviews.llvm.org/D64991

Files:
  clang/include/clang/Analysis/Analyses/ReachingDefinitions.h
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/lib/Analysis/CMakeLists.txt
  clang/lib/Analysis/ReachingDefinitions.cpp
  clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp
  clang/test/Analysis/dump-definitions.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D64991.221355.patch
Type: text/x-patch
Size: 43945 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20190923/dd832de1/attachment-0001.bin>


More information about the cfe-commits mailing list