[PATCH] D76287: [analysis][analyzer] Introduce the skeleton of a reaching definitions calculator

Kristóf Umann via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Mar 17 08:31:26 PDT 2020


Szelethus created this revision.
Szelethus added reviewers: NoQ, xazax.hun, martong, whisperity, rnkovacs, dcoughlin, baloghadamsoftware.
Szelethus added a project: clang.
Herald added subscribers: cfe-commits, DenisDvlp, steakhal, Charusso, gamesh411, dkrupp, donat.nagy, mikhail.ramalho, a.sidorin, szepet, mgorny.

The following revision adds the basic infrastructure for a reaching definitions algorithm for C++.

Short description & motivation
==============================

This is a dataflow algorithm designed to find a set of definitions (for example assignments) to variables in a given `CFGBlock`. To demonstrate, in the following code snippet:

  int flag;
  
  void foo();
  
  void f() {
    int *x = nullptr;
    flag = 1;
  
    foo();
    if (flag)
      x = new int;
  
    foo();
    if (flag)
      *x = 5;
  }

The CFG would look like this:

  
                     -> [B3] ->    -> [B1] ->
                    /          \  /          \
  [B5 (ENTRY)] -> [B4] ------> [B2] ---> [B0 (EXIT)]

Should `foo()` change `flag`'s value on the first function call to `false`, then `true` on the second, a null pointer dereference error would occur. Using a reaching definitions calculator, we can retrieve that the set of ingoing definitions to B1 <https://reviews.llvm.org/B1> is `{(flag, [B2]), (x, [B3]), (x, [B4])}`. The set hints that `x` has a reaching definitions that would not have caused a nullpointer dereference.

The algorithm
=============

A reaching definition for a given instruction is an earlier instruction whose target variable can reach (be assigned to) the given one without an intervening assignment. The similarly named reaching definitions is a data-flow analysis which statically determines which definitions may reach a given point in the code [1].

The set of ingoing and outgoing reaching definitions are calculated for each basic block with set operations on GEN and KILL sets. This could be further fine-grained so the RD sets would be a property of a statement, rather then an entire `CFGBlock`, but I didn't want to delay the publishing of this patch any further, and I believe that such a change wouldn't hurt the infrastructure much.

Implementation
==============

As the formal definition would suggest, this was conceived for instructions, which is why even such terms as "variable" or "definition" can be hard to define for C++. For this reason, the patch spares a lot of LOC for documentation and reasoning. While the algorithm itself is simple (and is implemented quite literally from [1]), the problematic part of this is the generation of of GEN and KILL sets. I tried to introduce an infrastructure that can tolerate a lot of things I have inevitable forgotten (or left for followup patches) with the use of easy to add AST matchers.

Immediate questions to address
==============================

> We already have 2 dataflow algorithms implemented in the Analysis library: UninitializedObject and LiveVariables. Reaching definitions doesn't sound all too dissimilar. Why are we adding hundreds of LOC here? Can't we reuse some of it?

UninitializedObject and LiveVariables are practically the same algorithm with minimal differences. Despite this, they take up ~750 and ~1050 LOC respectively. Both of those algorithms can be expressed with the use of GEN and KILL sets, yet none of them are, and they duplicate a lot of logic. It isn't terribly obvious however how their logic could be (or really, should be) merged.

Shockingly, we have no GEN and KILL set implementations in Clang. I think that is the main addition of this patch, even if it unfortunately duplicates some logic. The previously mentioned two analyses however could serve as an inspiration.

> UninitializedObject and LiveVariables uses ASTVisitors rather than ASTMatchers. The latter is more expensive. Why did we go with them?

Matchers are more expressive. For instance, how would you find `s.a.x.z` with visitors? Its doable, but requires to keep track of a lot of state and would make the implementation ugly. I don't have complete confidence in my decision here, so I welcome alternative suggestions or counterarguments.

> What are the main things to get done?

In order:

- Finalize the infrastructure for GEN set generation.
- Make it possible to calculate RD sets for statements, not only blocks.
- Improve the interface of `ReachingDefinitionsCalculator`. What we like to query? Most probably the set of ingoing RDs for a variable at a given statement.
- Performance: use immutable data structures and a better CFG traversal strategy.

Further reading
===============

My GSoC project: https://szelethus.github.io/gsoc2019/ (this has a lot of pointers to related discussions in the 'Reaching Definitions Analysis' section)

[cfe-dev] Dataflow analyses in Clang -- why do we not have GEN and	KILL sets? http://lists.llvm.org/pipermail/cfe-dev/2020-March/064893.html

[analyzer][WIP] Implement a primitive reaching definitions analysis D64991 <https://reviews.llvm.org/D64991>

References
==========

My main source was wikipedia:
[1] https://en.wikipedia.org/wiki/Reaching_definition

I read the following articles, but they didn't give me the information I needed:

Tonella, Paolo, et al. "Variable precision reaching definitions analysis for software maintenance." Proceedings. First Euromicro Conference on Software Maintenance and Reengineering. IEEE, 1997.

Collard, Jean-François, and Jens Knoop. "A comparative study of reaching-definitions analyses." (1998).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D76287

Files:
  clang/include/clang/Analysis/Analyses/ReachingDefinitions.h
  clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
  clang/lib/Analysis/CMakeLists.txt
  clang/lib/Analysis/ReachingDefinitions.cpp
  clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp
  clang/test/Analysis/dump-definitions.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D76287.250740.patch
Type: text/x-patch
Size: 31150 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20200317/10cddec6/attachment-0001.bin>


More information about the cfe-commits mailing list