[PATCH] D159284: [clang][dataflow] Fix Record initialization with InitListExpr and inheritances

Mon Sep 4 08:01:10 PDT 2023

mboehme added inline comments.

================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:632-635
+    // For type comparison assertions.
+    [[maybe_unused]] auto GetCanonicalType = [](QualType Ty) {
+        return Ty.getCanonicalType().getUnqualifiedType();
+    };
----------------
This adds a fair bit of clutter. Can you just use `Type::getCanonicalTypeUnqualified()`?

================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:643
+
+    // `S->init()` contains all the InitListExprs including the direct base
+    // classes.
----------------
`InitListExpr`s isn't accurate here, as the entries in `S->inits()` are not necessarily of that type. Also note typo in `S->inits()`.

================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:646
+    auto Inits = S->inits();
+    int InitIdx = 0;
+
----------------
Nit: Make this a `size_t` for consistency with the type of `Inits.size()`?

================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:648
+
+    if (auto* R = S->getType()->getAsCXXRecordDecl()) {
+      assert(FieldsForInit.size() + R->getNumBases() == Inits.size());
----------------
I feel it would be useful to give a summary of what this large-ish block of code does (as it's not obvious from the first line).

================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:650
+      assert(FieldsForInit.size() + R->getNumBases() == Inits.size());
+      for ([[maybe_unused]] const CXXBaseSpecifier &Base : R->bases()) {
+        assert(InitIdx < Inits.size());
----------------
This `[[maybe_unused]]` seems unnecesary, as `Base` is definitely used in the `isLambda()` check?

================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:653-654
+        auto Init = Inits[InitIdx++];
+        assert(GetCanonicalType(Base.getType()) ==
+               GetCanonicalType(Init->getType()));
+        if (Base.getType()->getAsCXXRecordDecl()->isLambda()) {
----------------
I believe this can be

```
        assert(Base.getType().getCanonicalType() ==
               Init->getType().getCanonicalType());
```

- The initializer for a base class should always be a prvalue
- And I believe the type of a prvalue can't have `const` or `volatile` qualifiers (neither of them really makes sense)

================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:655
+               GetCanonicalType(Init->getType()));
+        if (Base.getType()->getAsCXXRecordDecl()->isLambda()) {
+          // FIXME: Figure out what to do for lambdas.
----------------
- Can this actually happen? (How do you make a lambda a base class?)
- Why does this need to be handled separately? (Isn't a lambda just a normal class, modulo sugar? What is different about lambdas that means we can't handle them like other classes?)

Depending on the answers to the above, would also be useful to capture them as comments in the source code.

I assume you encountered this scenario somewhere. Can you add a test (with incorrect behavior marked as a FIXME) that exercises this?

================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:659-660
+        }
+        auto* RecordVal = dyn_cast<RecordValue>(Env.getValue(*Init));
+        assert(RecordVal != nullptr);
+        auto Children = RecordVal->getLoc().children();
----------------
- Instead of doing `dyn_cast`, then asserting non-null, one can just do a `cast`, as that asserts that the argument is actually of the required type
- However, I believe it's possible that `getValue()` can return null in this case, so we should use `cast_or_null`, then invent a value if we didn't get one.
- Also, suggest `BaseVal` instead of `RecordVal` to emphasize that this is the `RecordValue` for the base class

================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:661-662
+        assert(RecordVal != nullptr);
+        auto Children = RecordVal->getLoc().children();
+        FieldLocs.insert(Children.begin(), Children.end());
+      }
----------------
I feel it would be useful to add an explanatory comment.

================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:672-673
+          // The types are same, or
+          GetCanonicalType(Field->getType()) ==
+          GetCanonicalType(Init->getType()) ||
+          // The field's type is T&, and initializer is T
----------------
Again, this is the prvalue case, so I think you should simply be able to use `QualType::getCanonicalType()`.

================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:683-686
+      // `ModeledFields` also contains from all the bases, but only the modeled
+      // ones. Having an initializer list for a struct basically populates all
+      // the fields for the struct, so the fields set that is populated here
+      // should match the modeled fields without additional filtering.
----------------
Rewrote this a bit with the aim of making it clearer -- WDYT?

================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:687-689
+      auto Modeled = Env.getDataflowAnalysisContext().getModeledFields(Type);
+      llvm::DenseSet<const Decl *> ModeledFields;
+      ModeledFields.insert(Modeled.begin(), Modeled.end());
----------------
`SmallSetVector` has a `contains()` method, so I think there is no need to convert to `DenseSet`?

================
Comment at: clang/unittests/Analysis/FlowSensitive/TransferTest.cpp:1443-1458
+    struct Base3 {
+      int x7;
+      int x8;
+    };
+    struct Base2 : Base3 {
+      int x5;
+      int x6;
----------------
Can you make the class names more descriptive?

Suggestion:

Foo => MostDerived
Base2 => Intermediate
Base3 => Base1
Base => Base2

Also, I think it would be useful to tie the member variable names to the class names -- see suggested edit.

================
Comment at: clang/unittests/Analysis/FlowSensitive/TransferTest.cpp:1462-1464
+      F.x4 = F.x2;
+      F.x6 = 1;
+      F.x8 = 1;
----------------
Is there a reason we're doing different operations on these fields?

IIUC, the only thing we want to do is to ensure they're modeled -- so maybe do the same for each? (Otherwise, it looks as if we're doing things differently so that it triggers different behavior, even when that's not the case.)

================
Comment at: clang/unittests/Analysis/FlowSensitive/TransferTest.cpp:1484
+        const ValueDecl *X8Decl = findValueDecl(ASTCtx, "x8");
+        ASSERT_THAT(FooDecl, NotNull());
+
----------------
`findValueDecl()` asserts internally that its return value is non-null, so this assertion is unnecessary.

================
Comment at: clang/unittests/Analysis/FlowSensitive/TransferTest.cpp:1487-1488
+        // Only "x2", "x4", "x6", and "x8" are accessed and exist in the model.
+        const auto *FooLoc =
+            cast<RecordStorageLocation>(Env.getStorageLocation(*FooDecl));
+        llvm::DenseSet<const ValueDecl*> Fields;
----------------

================
Comment at: clang/unittests/Analysis/FlowSensitive/TransferTest.cpp:1489-1503
+        llvm::DenseSet<const ValueDecl*> Fields;
+        for (auto [Field, _] : FooLoc->children()) {
+          Fields.insert(Field);
+        }
+        ASSERT_EQ(Fields.size(), 4);
+        ASSERT_TRUE(Fields.contains(X2Decl));
+        ASSERT_TRUE(Fields.contains(X4Decl));
----------------

================
Comment at: clang/unittests/Analysis/FlowSensitive/TransferTest.cpp:1528-1530
+      F.x4 = F.x2;
+      F.x6 = 1;
+      F.x8 = 1;
----------------
These lines seem unnecessary. The test only wants to check whether the fields are modeled, not what their values are.

If anything, leaving out these operations makes the test stronger: It then tests whether the fields are modeled even if they are only initialized with an empty `{}`.

Then, once we leave out these operations, it's probably sufficient to give each class in this test just a single member variable (which is another nice simplification).

================
Comment at: clang/unittests/Analysis/FlowSensitive/TransferTest.cpp:1556-1557
+        // and "x8")
+        const auto *FooLoc =
+            cast<RecordStorageLocation>(Env.getStorageLocation(*FooDecl));
+        const auto *X1Val =
----------------

================
Comment at: clang/unittests/Analysis/FlowSensitive/TransferTest.cpp:2191
+TEST(TransferTest, AssignmentOperatorWithInitAndInheritance) {
+  // This is a crash repro.
+  std::string Code = R"(
----------------
IIUC, the crash was happening because `copyRecord()` assumes that the source and destination have the same fields. More specifically, there was an unspoken invariant in the framework that a `RecordStorageLocation` should always contain exactly the fields returned by `DataflowAnalysisContext::getModeledFields()`), but our treatment of `InitListExpr` was violating this invariant.

IOW, this wasn't really an issue with the way we treat copy/move operations but with `InitlistExpr`.

I understand that we want to have a test the reproduces the exact circumstances that led to the crash, but maybe it's enough to have one of these, and have the focus of the testing be on testing the actual invariant that we want to maintain? IOW, maybe keep this test but delete the additional tests below?

================
Comment at: clang/unittests/Analysis/FlowSensitive/TransferTest.cpp:2208-2220
+        const ValueDecl *S1Decl = findValueDecl(ASTCtx, "S1");
+        const ValueDecl *S2Decl = findValueDecl(ASTCtx, "S2");
+        const ValueDecl *S3Decl = findValueDecl(ASTCtx, "S3");
+        const ValueDecl *FooDecl = findValueDecl(ASTCtx, "Foo");
+
+        const Environment &Env =
+              getEnvironmentAtAnnotation(Results, "p");
----------------

================
Comment at: clang/unittests/Analysis/FlowSensitive/TransferTest.cpp:2498
+TEST(TransferTest, ReturnStructWithInheritance) {
+  // This is a crash repro. (Returning a struct involves copy/move constructor)
+  std::string Code = R"(
----------------
I don't think this is true -- the `return S1` in `target()` will use NRVO.

But I also don't think it's relevant -- I believe the crash this is reproducing was triggered by the `InitListExpr`, not the return statement?

Given the other tests added in this patch, do we need this test?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D159284/new/

https://reviews.llvm.org/D159284