[PATCH] D114234: [clang][dataflow] Add base types for building dataflow analyses

Fri Nov 26 06:10:04 PST 2021

sgatev added a comment.

Thanks Gábor and Dmitri!

================
Comment at: clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h:48
+/// Type-erased base class for dataflow analyses built on a single lattice type.
+class DataflowAnalysisDynamic {
+public:
----------------
xazax.hun wrote:
> sgatev wrote:
> > xazax.hun wrote:
> > > Does the `Dynamic` in the suffix refer to the fact this is using type erasure as opposed to templates? 
> > > 
> > > I have multiple questions at this point:
> > > * Why do we prefer type erasure over generic programming?
> > > * Do you plan to have non-dynamic counterparts?
> > > 
> > > Nit: having Dynamic both in the class name and in the method names sounds overly verbose to me.
> > > Nit: please add a comment what dynamic refers to in the name,
> > Right. The "Dynamic" suffix emphasizes the type-erased nature of the class and its members and is used to differentiate them from their typed counterparts in `DataflowAnalysis`. I added this to the documentation of the type. I also split the typed and type-erased interfaces in separate files. Users of the framework shouldn't need to interact with types and functions from `DataflowAnalysisDynamic.h`.
> > 
> > The names are verbose, but should be used in relatively few internal places in the framework. Currently, we have one call site for each of the methods of `DataflowAnalysisDynamic`. Nevertheless, if you have ideas for better names I'd be happy to change them.
> > 
> > The reason we went with a type-erased layer is to avoid pulling a lot of code in the templates. Over time the implementation got cleaner and as I'm preparing these patches I see some opportunities to simplify it further. However, it's still non-trivial amount of code. I think we should revisit this decision at some point and consider having entirely template-based implementation of the algorithm. I personally don't see clear benefits of one approach over the other at this point. If you have strong preference for using templates, we can consider going down that route now.
> Thanks for the explanation! I don't have a strong preference, we could stick with the type-erased version unless we see some reason to change in the future. However, I don't see much value in the "Dynamic" suffix for the method names. What do you think about simply dropping them?
Well, the only reason I see to have the suffix is to differentiate the type-erased methods from the methods of the concrete dataflow analysis class that will derive from `DataflowAnalysis`. For example, we could have

```
class ConstantPropagationAnalysis : public DataflowAnalysis<
  ConstantPropagationAnalysis, ConstantPropagationLattice> {
public:
  ConstantPropagationLattice initialElement();

  ConstantPropagationLattice transfer(
    const Stmt *, const ConstantPropagationLattice &, Environment &);
};
``` 

In this setup `ConstantPropagationAnalysis` has two methods that return the initial element and two methods that perform the transfer function - one pair in the definition and one pair inherited from `TypeErasedDataflowAnalysis`. For the initial element we need to use different names as the two methods differ only in their return types. For the transfer function we could use the same name and rely on overload resolution, however it seems that's not recommended - https://clang.llvm.org/docs/DiagnosticsReference.html#woverloaded-virtual. The names of the remaining methods (`joinTypeErased` and `isEqualTypeErased`) have the suffix for consistency.

================
Comment at: clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h:94
+// Model of the program at a given program point.
+template <typename LatticeT> struct DataflowAnalysisState {
+  // Model of a program property.
----------------
xazax.hun wrote:
> If I understand this correctly, this could derive from `DataflowAnalysisStateDynamic`, it could just provide a getter function that casts the type erased lattice element to `LatticeT`, returning a reference to the contents of the `any` object. As a result, you would no longer need to do move/copy in `runDataflowAnalysis`. On the other hand, the user would need to call a getter to get out the lattice element. I guess we expect lattice elements to be relatively cheap to move. Feel free to leave this unchanged, it is more of an observation.
Acknowledged. I'll keep this option in mind. For now I suggest we keep the typed and type-erased structs separate if possible and if the cost isn't too high.

================
Comment at: clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h:102
+
+/// Performs dataflow analysis and returns a mapping from basic block IDs to
+/// dataflow analysis states that model the respective basic blocks.
----------------
xazax.hun wrote:
> While it is probably obvious to most of us, I wonder if it is obvious to all future readers that the block IDs are the indices of the vector. Depending on how beginner-friendly do we want these comments to be we could make that more explicit.
Let's make it clear. I added a note on that.

================
Comment at: clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisDynamic.h:45
+/// in `DataflowAnalysis`.
+class DataflowAnalysisDynamic {
+public:
----------------
xazax.hun wrote:
> Alternatively, we could replace `Dynamic` with `TypeErased` in the class name making the comment redundant.
That would be appropriate indeed. I updated the names.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114234/new/

https://reviews.llvm.org/D114234