[PATCH] D45532: [StaticAnalyzer] Checker to find uninitialized fields after a constructor call

Artem Dergachev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Apr 30 19:49:50 PDT 2018


NoQ added a comment.

Had a look at the actual code, came up with some comments, most are fairly minor, so good job! You did a good job explaining the overall idea, but a lot of specifics were difficult to understand, so i made a few comments on how to make the code a bit easier to read.

> I know, and I tried to look for ways to split this checker into smaller logical chunks, but I didn't find a satisfying solution.

What i would have done:

1. A checker that straightforwardly iterates through immediate fields of the current object and reports uninitialized fields.
2. Add base classes support.
3. Heuristic for skipping sub-constructors could use a separate review.
4. Start skipping arrays and unions.
5. Add simple pointer dereference support - introduce chains of fields.
6. More heuristics - the one for symbolic regions, the one for system header fields, etc.

etc.

I'm asking for this because it's, like, actually difficult and painful to understand formal information in large chunks. One-solution-at-a-time would have been so much easier. It is really a bad idea to start by presenting a working solution; the great idea is to present a small non-working solution with a reasonable idea behind it, and then extend it "in place" as much as it seems necessary. It is very comfortable when parts of the patch with different "status" (main ideas, corner-case fixes, heuristics, refactoring) stay in separate patches. Alpha checkers are essentially branches: because our svn doesn't have branches, we have alpha checkers and off-by-default flags. Even before step 1, you should be totally fine with committing an empty checker with a single empty callback and a header comment - even on such "step 0." we would be able to discuss if your approach to the checker seems right or might have inevitable issues, and it could have saved a lot of time.

Essentially every part of the solution that you implement, if you think it might deserve a separate discussion, also deserves a separate patch. It is reasonable to split the work into logical chunks before you start it. It's very unlikely that you have a single idea that takes 500 non-trivial lines of code to express. Splitting up ideas so that they were easy to grasp is an invaluable effort. I had very positive experience with that recently, both as a coder and as a reviewer, so i encourage that a lot.



================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:47
+class FieldChainInfo {
+  using FieldChain = llvm::SmallVector<const FieldRegion *, 10>;
+
----------------
At a glance it looks like a great use case for an immutable list. It's easy to use the immutable list as a stack, and it's also O(1) to copy/move and take snapshots of it.

Note that you copy 10 pointers every time you copy //or move// a `SmallVector`. Small vectors don't make much sense as fields of actively used data structures - their main purpose is to live on the stack as local variables.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:51-53
+  /// If this is a fieldchain whose last element is an uninitialized region of a
+  /// pointer type, this variable will store whether the pointer itself or the
+  /// pointee is uninitialized.
----------------
Please move this doc to the getter/setter function. That's where people expect to find it when they're reading code.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:58-59
+  FieldChainInfo() = default;
+  /// Delegates to the copy ctor and adds FR to Chain. Note that Chain cannot be
+  /// modified in a FieldChainInfo object after its construction.
+  FieldChainInfo(const FieldChainInfo &Other, const FieldRegion *FR);
----------------
Do we really need the other field (`IsDereferenced`) to be mutable? Or maybe we can keep the whole object immutable? I.e., instead of `dereference()` we'll get a constructor that constructs a `FieldChainInfo` with the same chain but with the dereferenced flag set.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:86-89
+  StoreManager &StoreMgr;
+  MemRegionManager &MrMgr;
+  SourceManager &SrcMgr;
+  Store S;
----------------
All of these can be replaced with a single `ProgramStateRef` object.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:93
+  const bool IsPedantic;
+  bool IsChecked = false;
+  bool IsAnyFieldInitialized = false;
----------------
This field is worth documenting.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:156-157
+  //
+  // We'll traverse each node of the above graph with the appropiate one of
+  // these methods:
+  bool isUnionUninit(const TypedValueRegion *R);
----------------
Please explain what these methods do in a more direct manner. Something like "This method returns true if an uninitialized field was found within the region. It should only be used with regions of union-type objects".


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:170-171
+
+/// Returns the with the object that was constructed by CtorDecl, or None if
+/// that isn't possible.
+Optional<nonloc::LazyCompoundVal>
----------------
Something's missing before `with`.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:176-177
+/// Checks whether the constructor under checking is called by another
+/// constructor. This avoids essentially the same error being reported multiple
+/// times.
+bool isCalledByConstructor(const CheckerContext &Context);
----------------
The second sentence would look great at the call site.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:181-182
+/// Returns whether FD can be (transitively) dereferenced to a void pointer type
+/// (void*, void**, ...). The type of the region behind a void pointer isn't
+/// known, and thus FD can not be analyzed.
+bool isVoidPointer(const FieldDecl *FD);
----------------
Type of an object pointed to by a void pointer is something that our path-sensitive engine is sometimes able to provide. It is not uncommon that a void pointer points to a concrete variable on the current path, and we may know it in the analyzer. I'm not sure this sort of check is necessary (i.e. require more information).

You may also want to have a look at `getDynamicTypeInfo()` which is sometimes capable of retrieving the dynamic type of the object pointed to by a pointer or a reference - i.e. even if it's different from the pointee type of the pointer.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:185
+
+} // namespace
+
----------------
We're traditionally writing "end anonymous namespace" or "end of anonymous namespace".


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:223-225
+  ExplodedNode *Node = Context.generateNonFatalErrorNode(Context.getState());
+  if (!Node)
+    return;
----------------
I suspect that a fatal error is better here. We don't want the user to receive duplicate report from other checkers that catch uninitialized values; just one report is enough.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:233-236
+  llvm::Twine WarningMsg = llvm::Twine(UninitFields.size()) +
+                           llvm::Twine(" uninitialized field") +
+                           llvm::Twine(UninitFields.size() == 1 ? "" : "s") +
+                           llvm::Twine(" at the end of the constructor call");
----------------
I believe that you only need the first `Twine`; then all `operator+` calls will be resolved to `operator+(Twine, ...)` automatically. Or just use a string and a stream like pretty much all other checkers.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:244
+    SmallString<200> FieldChainBuf;
+    FieldChain.toString(FieldChainBuf);
+    SmallString<200> NoteBuf;
----------------
This is the third approach to concatenating strings in the last 10 lines of code. I think you should make a method `FieldChainInfo::print(llvm::raw_ostream &)` and use streams everywhere.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:258-260
+    Report->addNote(NoteBuf,
+                    PathDiagnosticLocation::create(FieldChain.getEndOfChain(),
+                                                   Context.getSourceManager()));
----------------
Aha, ok, got it, so you're putting the warning at the end of the constructor and squeezing all uninitialized fields into a single warning by presenting them as notes.

Because for now notes aren't supported everywhere (and aren't always supported really well), i guess it'd be necessary to at least provide an option to make note-less reports before the checker is enabled by default everywhere. In this case notes contain critical pieces of information and as such cannot be really discarded.

I'm not sure that notes are providing a better user experience here than simply saying that "field x.y->z is never initialized". With notes, the user will have to scroll around (at least in scan-build) to find which field is uninitialized.

Notes will probably also not decrease the number of reports too much because in most cases there will be just one uninitialized field. Though i agree that when there is more than one report, the user will be likely to deal with all fields at once rather than separately.

So it's a bit wonky.

We might want to enable one mode or another in the checker depending on the analyzer output flag. Probably in the driver (`RenderAnalyzerOptions()`).

It'd be a good idea to land the checker into alpha first and fix this in a follow-up patch because this review is already overweight.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:292
+  if (!IsChecked)
+    hasUnintializedFields();
+  return UninitFields;
----------------
Why is checking done lazily in the first place? Are there many situations in which we want to construct the object but never do the actual scan?

If it was actually necessary, i would have split out the side effect of this getter into a separate private method.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:346
+    // If this is a pointer or reference type.
+    if (V.getAs<Loc>()) {
+      if (isPointerOrReferenceUninit(FR, {LocalChain, FR}))
----------------
Please check the type explicitly, and then assert that the value is a `Loc`. There are a lot of nuances that you don't want to be dealing with here (eg. `ObjCObjectPointerType`, which can be present in C++ as well because Objective-C++ is a thing, is not a `PointerType` or a `ReferenceType`, but it is modeled with a `Loc`).


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:352
+
+    // At this point the field is a fundamental type.
+    if (isFundamentalUninit(V)) {
----------------
Please formalize what you mean by a fundamental type and assert that. You probably mean something like "an integer or a floating-point number or maybe an enum or, well, bool".

I suspect that many cases here are missed (complex types, vector types, ...).


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:359-372
+  // Checking bases.
+  const auto *CXXRD = dyn_cast<CXXRecordDecl>(RD);
+  if (!CXXRD)
+    return ContainsUninitField;
+
+  for (const CXXBaseSpecifier BaseSpec : CXXRD->bases()) {
+    const auto *Base = BaseSpec.getType()->getAsCXXRecordDecl();
----------------
Are there many warnings that will be caused by this but won't be caused by conducting the check for the constructor-within-constructor that's currently disabled? Not sure - is it even syntactically possible to not initialize the base class?


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:384
+bool FindUninitializedFields::isPointerOrReferenceUninit(
+    const FieldRegion *FR, FieldChainInfo LocalChain) {
+
----------------
Please assert here that `FR` is a pointer/reference-type field.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:386
+
+  SVal V = StoreMgr.getBinding(S, loc::MemRegionVal(FR));
+
----------------
As i mentioned above, you can simply store a `ProgramStateRef` within `FindUninitializedFields`. In this case this code will be simplified to `State->getSVal(FR)`.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:405-406
+  SVal DerefdV = StoreMgr.getBinding(S, V.castAs<Loc>());
+  while (Optional<Loc> L = DerefdV.getAs<Loc>())
+    DerefdV = StoreMgr.getBinding(S, *L);
+
----------------
Is this loop guaranteed to terminate? Is something like `void *v; v = &v;` possible?

I think this loop should unwrap the AST type on every iteration and dereference the pointer accordingly.

In general, i believe that every symbolic pointer dereference should be made with awareness of the type of the object we're trying to retrieve. Which is an optional argument for `State->getSVal(R, ...)`, but it seems that it should be made mandatory because in a lot of cases omitting it causes problems.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:417
+    // constructors for list-like classes are checked without being called, and
+    // the CSA will conjure a symbolic region for Node *next; or similar code
+    // snippets.
----------------
CSA is not a commonly used acronym around the codebase.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:417
+    // constructors for list-like classes are checked without being called, and
+    // the CSA will conjure a symbolic region for Node *next; or similar code
+    // snippets.
----------------
NoQ wrote:
> CSA is not a commonly used acronym around the codebase.
I'd like to change the word "conjure" to something else because in most cases it won't be an actual `SymbolConjured`, but it'd be a `SymbolRegionValue` or `SymbolDerived`. Conjuring occurs in different circumstances.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:430
+    if (T->isUnionType()) {
+      // TODO: does control ever reach here?
+      if (isUnionUninit(R)) {
----------------
Add `llvm_unreachable("")` to find out :)


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:432
+      if (isUnionUninit(R)) {
+        LocalChain.dereference();
+        return addFieldToUninits(std::move(LocalChain));
----------------
Needs a comment on why the `IsDereferenced` flag on this path.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:480
+
+// TODO: This function constructs an incorrect fieldchain string in the
+// following case:
----------------
Can we at least detect these situations and suppress the warning? It should be relatively easy for lambdas (captured variable mapping should be stored somewhere, as far as i remember), not sure how easy it'd be for double inheritance (there are fancy methods for scanning this stuff, but that's equivalent to actually implementing the correct warning message).

Weird warning messages are pretty scary to show to the users.

It'd be a good idea to land the checker into alpha first and fix this in a follow-up patch because this review is already overweight.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:559-560
+  while (LC) {
+    if (isa<CXXConstructorDecl>(LC->getDecl()))
+      return true;
+
----------------
This doesn't check that the constructor on the parent stack frame is anyhow related to the current constructor, so it may introduce false negatives. For instance, a class may lazy-initialize a singleton but never store a reference to it within itself (because, well, it's a singleton, you can always obtain the reference to it). In this case we'll never find the bug in the constructor of the singleton.


================
Comment at: test/Analysis/cxx-uninitialized-object.cpp:931
+// FIXME: As of writing this checker, there is no good support for union types
+// in the CSA. Here is non-exhaustive list of cases.
+// Note that the rules for unions are different in C++ and C.
----------------
CSA is not a commonly used acronym around the codebase.


https://reviews.llvm.org/D45532





More information about the cfe-commits mailing list