[PATCH] D45532: [StaticAnalyzer] Checker to find uninitialized fields after a constructor call
Artem Dergachev via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri May 25 14:15:27 PDT 2018
NoQ added a comment.
Ok! Putting any remaining work into follow-up patches would be great indeed. Let's land this into `alpha.cplusplus` for now because i still haven't made up my mind for how to organize the proposed `bugprone` category (sorry!).
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:223-225
+ ExplodedNode *Node = Context.generateNonFatalErrorNode(Context.getState());
+ if (!Node)
+ return;
----------------
Szelethus wrote:
> NoQ wrote:
> > I suspect that a fatal error is better here. We don't want the user to receive duplicate report from other checkers that catch uninitialized values; just one report is enough.
> I think that would be a bad idea. For example, this checker shouts so loudly while checking the LLVM project, it would practically halt the analysis of the code, reducing the coverage, which means that checkers other then uninit value checkers would "suffer" from it.
>
> However, I also think that having multiple uninit reports for the same object might be good, especially with this checker, as it would be very easy to see where the problem originated from.
>
> What do you think?
Well, i guess that's the reason to not use the checker on LLVM. Regardless of fatal/nonfatal warnings, enabling this checker on LLVM regularly would be a bad idea because it's unlikely that anybody will be able to fix all the false positives to make it usable. And for other projects that don't demonstrate many false positives, this shouldn't be a problem.
In order to indicate where the problem originates from, we have our bug reporter visitors that try their best to add such info directly to the report. In particular, @george.karpenkov's recent `NoStoreFuncVisitor` highlights functions in which a variable was //not// initialized but was probably expected to be. Not sure if it highlights constructors in its current shape, but that's definitely a better way to give this piece of information to the user because it doesn't make the user look for a different report to understand the current report.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:258-260
+ Report->addNote(NoteBuf,
+ PathDiagnosticLocation::create(FieldChain.getEndOfChain(),
+ Context.getSourceManager()));
----------------
Szelethus wrote:
> NoQ wrote:
> > NoQ wrote:
> > > Szelethus wrote:
> > > > NoQ wrote:
> > > > > Aha, ok, got it, so you're putting the warning at the end of the constructor and squeezing all uninitialized fields into a single warning by presenting them as notes.
> > > > >
> > > > > Because for now notes aren't supported everywhere (and aren't always supported really well), i guess it'd be necessary to at least provide an option to make note-less reports before the checker is enabled by default everywhere. In this case notes contain critical pieces of information and as such cannot be really discarded.
> > > > >
> > > > > I'm not sure that notes are providing a better user experience here than simply saying that "field x.y->z is never initialized". With notes, the user will have to scroll around (at least in scan-build) to find which field is uninitialized.
> > > > >
> > > > > Notes will probably also not decrease the number of reports too much because in most cases there will be just one uninitialized field. Though i agree that when there is more than one report, the user will be likely to deal with all fields at once rather than separately.
> > > > >
> > > > > So it's a bit wonky.
> > > > >
> > > > > We might want to enable one mode or another in the checker depending on the analyzer output flag. Probably in the driver (`RenderAnalyzerOptions()`).
> > > > >
> > > > > It'd be a good idea to land the checker into alpha first and fix this in a follow-up patch because this review is already overweight.
> > > > > [...]i guess it'd be necessary to at least provide an option to make note-less reports before the checker is enabled by default everywhere. In this case notes contain critical pieces of information and as such cannot be really discarded.
> > > > This can already be achieved with `-analyzer-config notes-as-events=true`. There no good reason however not to make an additional flag that would emit warnings instead of notes.
> > > > > I'm not sure that notes are providing a better user experience here than simply saying that "field x.y->z is never initialized". With notes, the user will have to scroll around (at least in scan-build) to find which field is uninitialized.
> > > > This checker had a previous implementation that emitted a warning for each uninitialized field, instead of squeezing them in a single warning per constructor call. One of the major reasons behind rewriting it was that it emitted so many of them that it was difficult to differentiate which warning belonged to which constructor call. Also, in the case of the LLVM/Clang project, the warnings from this checker alone would be in the thousands.
> > > > > Notes will probably also not decrease the number of reports too much because in most cases there will be just one uninitialized field.
> > > > While this is true to some extent, it's not uncommon that a single constructor call leaves 7 uninitialized -- in the LLVM/Clang project I found one that had over 30!
> > > > Since its practically impossible to determine whether this was a performance enhancement or just poor programming, the few cases where it is intentional, an object would emit many more warnings would make make majority of the findings (where an object truly had only 1 or 2 fields uninit) a lot harder to spot in my opinion.
> > > >
> > > > In general, I think one warning per constructor call makes the best user experience, but a temporary solution should be implemented until there's better support for notes.
> > > >
> > > > I agree, this should be a topic of a follow-up patch.
> > > >
> > > > This can already be achieved with `-analyzer-config notes-as-events=true`.
> > >
> > > Yep. But the resulting user experience seems pretty bad to me. It was a good quick proof-of-concept trick to test things, but the fact that notes aren't path pieces is way too apparent in case of this checker. So i guess this flag was a bad idea.
> > >
> > > > Also, in the case of the LLVM/Clang project, the warnings from this checker alone would be in the thousands.
> > >
> > > This makes me a bit worried, i wonder what's the reason why does the checker shout so loudly on a codebase that doesn't seem to have actual uninitialized value bugs all over the place.
> > >
> > > Are any of these duplicates found in the same constructor for the same field in different translation units? Suppose we discard the duplicates (if any) and warn exactly once (across the whole LLVM) for every uninitialized field (which is probably more than once per constructor). Then I wonder:
> > > 1. How many uninitialize fields would we be able to find this way?
> > > 2. How many of such fields were intentionally left uninitialized?
> > > 3. How many were found by deep inspection of the heap through field chains involving pointer dereference "links"?
> > (when i'm asking this sort of stuff, i don't mean you should look through thousands of positives, a uniform random sample of ~50 should be just fine)
> > (but we really do need this info before enabling stuff by default)
> >>This can already be achieved with -analyzer-config notes-as-events=true.
> >Yep. But the resulting user experience seems pretty bad to me. It was a good quick proof-of-concept trick to test things, but the fact that notes aren't path pieces is way too apparent in case of this checker. So i guess this flag was a bad idea.
>
> I totally agree, I meant that a note-less solution should only be a (arguably better) workaround just as `notes-as-events=true`.
>
> >Are any of these duplicates found in the same constructor for the same field in different translation units? Suppose we discard the duplicates (if any) and warn exactly once (across the whole LLVM) for every uninitialized field (which is probably more than once per constructor).
>
> Sure, having the same field field reported from the same constructor call in different TUs happens quite a lot, but they can easily be uniqued.
>
> In the checkers current state, meaning that one warning per ctor call, the LLVM/Clang project in pedantic mode resulted in **409 non-unique warnings**, and using CodeChecker I found that **181 of these is unique**.
>
> >How many uninitialize fields would we be able to find this way?
> In its current state, in pedantic mode, after uniqueing **~581**. Now that's quite a bit off from what I remembered //"the warnings from this checker alone would be in the thousands"//, but having one warning per uninit field would still result in a ~320% increase in reports.
> >How many of such fields were intentionally left uninitialized?
> (I checked around 95 reports a couple weeks back before uploading this diff)
> **Almost all of them.** Note that this is a very performance-critical project. From what I saw, many times several fields are left uninitialized, but these fields are inaccessible due to a `kind` field being asserted at their getter functions.
> Also, I found cases where the constructor wasn't defaulted with `= default;`.
>
> In fact, I struggled to find a case where a field was 100% left out by accident. Maybe in `llvm/lib/IR/ConstantsContext.h`, calling `ConstantExprKeyType(ArrayRef<Constant *> Operands, const ConstantExpr *CE)`.
>
> This is not the case however in other projects I checked. In Xerces for example every find was a true positive.
>
> >How many were found by deep inspection of the heap through field chains involving pointer dereference "links"?
> This is somewhat harder to measure. The best I could come up with is uniqueing the fieldchains from the plist files with lexicographical sorting. Your question can be answered by measuring the length of the fieldchains. I found
> * 3 fieldchains with the length of 5
> * 18 fieldchains with the length of 4
> * 62 fieldchains with the lenfth of 3
> * 108 fieldchains with the length of 2
> * 137 fieldchains with the length of 1
> (that is a total of 328 unique fieldchains lexicographically)
> Because heap objects are not modeled, only 3 of the 328 fieldchains contains pointer dereferences. However, reference objects are very common from what I saw, but sadly I can't give you an exact number on those.
>
> I hope these answers were satisfying, but I'll follow up with more information if needed.
Thanks for the detailed stats! Yeah, that's a lot of "immediately" uninitialized fields. My concerns are mostly clarified. Yeah i was curious about references as well, but it's clear that chains of length 1 contain no references.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:359-372
+ // Checking bases.
+ const auto *CXXRD = dyn_cast<CXXRecordDecl>(RD);
+ if (!CXXRD)
+ return ContainsUninitField;
+
+ for (const CXXBaseSpecifier BaseSpec : CXXRD->bases()) {
+ const auto *Base = BaseSpec.getType()->getAsCXXRecordDecl();
----------------
Szelethus wrote:
> NoQ wrote:
> > Are there many warnings that will be caused by this but won't be caused by conducting the check for the constructor-within-constructor that's currently disabled? Not sure - is it even syntactically possible to not initialize the base class?
> I'm not a 100% sure what you mean. Can you clarify?
> >Not sure - is it even syntactically possible to not initialize the base class?
> If I understand the question correctly, no, as far as I know.
You have a check for `isCalledByConstructor()` in order to skip base class constructors but then you check them manually. That's a lot of code, but it seems that the same result could have been achieved by simply skipping the descent into the base class.
Same question for class-type fields, actually.
================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:405-406
+ SVal DerefdV = StoreMgr.getBinding(S, V.castAs<Loc>());
+ while (Optional<Loc> L = DerefdV.getAs<Loc>())
+ DerefdV = StoreMgr.getBinding(S, *L);
+
----------------
Szelethus wrote:
> NoQ wrote:
> > Is this loop guaranteed to terminate? Is something like `void *v; v = &v;` possible?
> >
> > I think this loop should unwrap the AST type on every iteration and dereference the pointer accordingly.
> >
> > In general, i believe that every symbolic pointer dereference should be made with awareness of the type of the object we're trying to retrieve. Which is an optional argument for `State->getSVal(R, ...)`, but it seems that it should be made mandatory because in a lot of cases omitting it causes problems.
> > Is this loop guaranteed to terminate? Is something like void *v; v = &v; possible?
> I looked this up, and I am confident that it is not possible for a pointer to point to itself. Only a `void**` object may point to a `void*`. The loop terminates even if you do something evil like `v = reinterpret_cast<void*>(&v);` (I have tried this with `int`s).
>
> >I think this loop should unwrap the AST type on every iteration and dereference the pointer accordingly.
> >
> >In general, i believe that every symbolic pointer dereference should be made with awareness of the type of the object we're trying to retrieve. Which is an optional argument for State->getSVal(R, ...), but it seems that it should be made mandatory because in a lot of cases omitting it causes problems.
>
> I invested some serious effort into `getDynamicType`, but I think it'll require more time and some serious refactoring of this (`isPointerOrReferenceUninit`) function, so I'd post it separately.
> Only a `void**` object may point to a `void*`.
That's definitely not true. The whole point of `void *` is that it can point to anything, including another `void *`. In fact, even `void *v = &v;` compiles in both C and C++.
I think it's better to have a stronger guarantee that this loop terminates.
================
Comment at: test/Analysis/cxx-uninitialized-object.cpp:1412
+ struct RecordType;
+ // no-crash
+ RecordType *recptr; // expected-note{{uninitialized pointer 'this->recptr}}
----------------
Szelethus wrote:
> whisperity wrote:
> > What is this comment symbolising? Is this actually something the test infrastructure picks up?
> `// no-crash` is something I found in many other test files, in this case it symbolizes that the point of this test isn't to check whether the checker correctly finds or ignores an uninit field, but that it doesn't crash.
Yup, `// no-crash` is indeed a good thing to mention.
Note: `// no-warning` is also not picked up by any infrastructure. It is only for the reader.
https://reviews.llvm.org/D45532
More information about the cfe-commits
mailing list