[PATCH] D115235: [clang][dataflow] Implement a basic algorithm for dataflow analysis

Gábor Horváth via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Dec 7 09:32:07 PST 2021


xazax.hun accepted this revision.
xazax.hun added a comment.
This revision is now accepted and ready to land.

This patch looks good to me, most of my comments are things to consider in follow-up patches.



================
Comment at: clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp:32
+TypeErasedDataflowAnalysisState computeBlockInputState(
+    std::vector<llvm::Optional<TypeErasedDataflowAnalysisState>> &BlockStates,
+    const CFGBlock &Block, const Environment &InitEnv,
----------------
Alternatively, if we have a bottom, `forall e in lattice join(e, bottom) == e`, so we do not really need optionals and can express nodes that do not contribute to the state using the bottom value. (Or using top, depending on the orientation for our lattices, unfortunately the literature is using both orientations which is really confusing.)

I'm fine with the current approach, but I'd love to see a function level comment about `nullopt`s representing nodes that were not yet evaluated.


================
Comment at: clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp:35
+    TypeErasedDataflowAnalysis &Analysis) {
+  TypeErasedDataflowAnalysisState State = {Analysis.typeErasedInitialElement(),
+                                           InitEnv};
----------------
I did not catch this earlier, but I wonder if we should pass the block in to `typeErasedInitialElement`. There are some analysis where the initial element might be different for certain nodes. E.g. here is the algorithms for computing dominators from wikipedia:
```
 // dominator of the start node is the start itself
 Dom(n0) = {n0}
 // for all other nodes, set all nodes as the dominators
 for each n in N - {n0}
     Dom(n) = N;
 // iteratively eliminate nodes that are not dominators
 while changes in any Dom(n)
     for each n in N - {n0}:
         Dom(n) = {n} union with intersection over Dom(p) for all p in pred(n)
```

Here the initial analysis state for the entry node differs from the initial state for the rest of the nodes. 


================
Comment at: clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp:69
+  for (const CFGElement &Element : Block) {
+    const llvm::Optional<CFGStmt> Stmt = Element.getAs<CFGStmt>();
+    if (!Stmt.hasValue())
----------------
I think this is fine for now, but in general, `CFGStmt` is probably not the only kind of `CFGElement` that we want to evaluate.

E.g., `CFGImplicitDtor` or `CFGLifetimeEnds` might also be interesting in the future if we want to find certain problems, like dereferencing an invalid pointer. I'd love to see a FIXME.


================
Comment at: clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp:104
+  // limit the number of iterations.
+  // FIXME: Consider making the maximum number of iterations configurable.
+  unsigned Iterations = 0;
----------------
I'd strongly suggest setting up some statistics in the future: https://llvm.org/docs/ProgrammersManual.html#the-statistic-class-stats-option

Having some counters, like how many function timed out, what is the average iteration count and so on could be very handy to monitor the performance of the analysis before and after some changes (both for client analyses or for the framework itself).


================
Comment at: clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp:134
+    Worklist.enqueueSuccessors(Block);
+  }
+
----------------
If a block still has `None` as its state at this point, it is unreachable in the CFG. One option is to ignore them, as we do at the moment and it is completely fine and should not change this patch. But for some analysis in the future, we might want to check dead code as well (it might be dead for the current platform due to some preprocessor defines but alive for some other). E.g., it would also be nice to diagnose a null dereference in a block that was not reached. 


================
Comment at: clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp:37-40
+    CFG::BuildOptions Options;
+    Options.AddImplicitDtors = true;
+    Options.AddTemporaryDtors = true;
+    Options.setAllAlwaysAdd();
----------------
I wonder if the dataflow analysis framework should provide a factory that that returns a `CFG::BuildOptions` which is a good default for most clients. Can be in a follow-up patch, or completely ignored. Or maybe even a convenience function running the analysis on a function definition without the user having to know about the CFG.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115235/new/

https://reviews.llvm.org/D115235



More information about the cfe-commits mailing list