[PATCH] D116368: [clang][dataflow] Add transfer function for VarDecl statements

Stanislav Gatev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Dec 30 06:11:49 PST 2021


sgatev added inline comments.


================
Comment at: clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h:61
+  ///  `D` must not be assigned a storage location.
+  void setStorageLocation(const VarDecl &D, StorageLocation &Loc) {
+    assert(VarDeclToLoc.find(&D) == VarDeclToLoc.end());
----------------
xazax.hun wrote:
> Are other language constructs coming in a follow-up PR? (E.g.: FieldDecl, locations on the heap). I'd like to some TODO comments if that is the case. Or if they are not supported by design, I'd love to some some comments about that.
> 
> Environment is referencing `ValueDecl` as opposed to `VarDecl`. What is the reason for this asymmetry?
> I think I overall find it confusing that some of the DeclToLoc mapping is here some is in the Environment.
> Does this has something to do with the fact that certain mappings should be the same regardless of which basic blocks we are processing (i.e., invariant globally so we do not need to duplicate it)? If that is the case please make it clear in the comments. 
> Are other language constructs coming in a follow-up PR? (E.g.: FieldDecl, locations on the heap). I'd like to some TODO comments if that is the case. Or if they are not supported by design, I'd love to some some comments about that.

Yes, more is coming in a follow-up. Nothing unsupported by design comes to mind at this point. I added FIXMEs to indicate the language constructs that we plan to implement first, but we will certainly need to add more going forward. Please let me know if there's something in particular that you'd like to see implemented early.

> Environment is referencing ValueDecl as opposed to VarDecl. What is the reason for this asymmetry?

This is not intentional. I updated `DataflowAnalysisContext` to use `ValueDecl` as well.

> I think I overall find it confusing that some of the DeclToLoc mapping is here some is in the Environment.
> Does this has something to do with the fact that certain mappings should be the same regardless of which basic blocks we are processing (i.e., invariant globally so we do not need to duplicate it)? If that is the case please make it clear in the comments.

`DeclToLoc` in `Environment` holds the set of `ValueDecl` to `StorageLocation` assignments that are in scope for a particular basic block. `DeclToLoc` in `DataflowAnalysisContext` aggregates the assignments across all basic blocks and is mainly used to produce stable storage locations when the same basic blocks are evaluated multiple times.  I added comments in the code explaining this.


================
Comment at: clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h:76
+  std::vector<std::unique_ptr<StorageLocation>> Locs;
+  std::vector<std::unique_ptr<Value>> Vals;
+  llvm::DenseMap<const VarDecl *, StorageLocation *> VarDeclToLoc;
----------------
xazax.hun wrote:
> Just curious: would it make sense to deduplicate these values?
I think no because even if the values are equal we'd like to track them separately as they represent different identities which is important for inference. For example:

```
std::optional<int> a = foo();
std::optional<int> b = bar();
std::optional<int> c = a;
// at this point we can model `a`, `b`, and `c` with the same value, however
if (a.has_value()) {
  // here we gain knowledge only about the values of `a` and `c`
  a.value(); // safe
  c.value(); // safe
  b.value(); // unsafe 
}
```

In our analysis we model the above by assigning the same value to the storage locations for `a` and `c` and assigning a different value to the storage location for `b`, even if the two values are equal.


================
Comment at: clang/include/clang/Analysis/FlowSensitive/Value.h:78
+
+  StorageLocation *getLocation() const { return Loc; }
+
----------------
xazax.hun wrote:
> I wonder if getLocation is ambiguous. (The pointer's own location vs the pointee location.) How about something like `getPointeeLoc`?
That's much better! I also changed the `StorageLocation` pointer to a reference in the constructor and the getter.


================
Comment at: clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp:32
+template <typename K, typename V>
+bool denseMapsAreEqual(const llvm::DenseMap<K, V> &Map1,
+                       const llvm::DenseMap<K, V> &Map2) {
----------------
xazax.hun wrote:
> Shouldn't we add `operator==` instead?
I'd be happy to do that. Do we need reviews from other folks for it? Would it make sense to move the code to the other location in a follow-up PR, to limit the scope of the change?


================
Comment at: clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp:46
+template <typename K, typename V>
+llvm::DenseMap<K, V> intersectDenseMaps(const llvm::DenseMap<K, V> &Map1,
+                                        const llvm::DenseMap<K, V> &Map2) {
----------------
xazax.hun wrote:
> I wonder if these algorithms should rather be somewhere in the support library.
I'd be happy to do that. Do we need reviews from other folks for it? Would it make sense to move the code to the other location in a follow-up PR, to limit the scope of the change?


================
Comment at: clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp:88
+    for (const FieldDecl *Field : Type->getAsRecordDecl()->fields()) {
+      FieldLocs.insert({Field, &createStorageLocation(Field->getType())});
+    }
----------------
xazax.hun wrote:
> Could this end up creating an overly large state? There might be objects with quite a lot fields but each function would only access a small subset of those. Alternatively, we could attempt to create the representations for fields on demand (this is the approach what the clang static analyzer is using). 
That's a great point, however I don't think initialization on demand plays well with the dataflow algorithm. For example:

```
struct S {
  int Val;
};

void target(bool b) {
  // basic block 1:
  S Foo;
  int Bar;
  if (b) {
    // basic block 2:
    Bar = Foo.Val;
  } else {
    // basic block 3:
    Bar = Foo.Val;
  }
  // basic block 4:
  ...
}
```
In basic block 4 we should be able to infer that the value that is assigned to the storage location for `Bar` is unambiguous. However, since `Foo.Value` isn't created in basic block 1, this means that it's not inherited by basic block 2 and basic block 3. Each of these blocks will end up creating distinct values to assign to the storage location for `Foo.Value` and so in basic block 4 the value for `Bar` will end up being ambiguous.

Alternatively, we can do a pass over all statements in all basic blocks before running the analysis to identify which fields are used in the function. We can use this information here to initialize only parts that are being used.

What do you think?


================
Comment at: clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp:144
+  if (Type->isRealFloatingType()) {
+    auto &Value = DACtx->takeOwnership(std::make_unique<FloatValue>());
+    setValue(Loc, Value);
----------------
xazax.hun wrote:
> Are we planning to track float values in the near future? If not, would it make sense to defer creating these objects until then?
No immediate plans to track floats. Removed it.


================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:41
+
+  if (const auto *DS = dyn_cast<DeclStmt>(&S)) {
+    transferDeclStmt(*DS, Env);
----------------
xazax.hun wrote:
> Instead of using a long `if else` and dynamic casts, don't we want to use an `StmtVisitor` instead?
Makes a lot of sense. Replaced it with a `ConstStmtVisitor` instance.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116368/new/

https://reviews.llvm.org/D116368



More information about the cfe-commits mailing list