[PATCH] D45532: [StaticAnalyzer] Checker to find uninitialized fields after a constructor call

Umann Kristóf via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue May 15 08:13:19 PDT 2018


Szelethus added a comment.

I'm afraid it'll be even more time before I post a new diff. There are some things that `ProgramState` is lacking, so I'll get that fixed first in a different pull request first.



================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:258-260
+    Report->addNote(NoteBuf,
+                    PathDiagnosticLocation::create(FieldChain.getEndOfChain(),
+                                                   Context.getSourceManager()));
----------------
NoQ wrote:
> NoQ wrote:
> > Szelethus wrote:
> > > NoQ wrote:
> > > > Aha, ok, got it, so you're putting the warning at the end of the constructor and squeezing all uninitialized fields into a single warning by presenting them as notes.
> > > > 
> > > > Because for now notes aren't supported everywhere (and aren't always supported really well), i guess it'd be necessary to at least provide an option to make note-less reports before the checker is enabled by default everywhere. In this case notes contain critical pieces of information and as such cannot be really discarded.
> > > > 
> > > > I'm not sure that notes are providing a better user experience here than simply saying that "field x.y->z is never initialized". With notes, the user will have to scroll around (at least in scan-build) to find which field is uninitialized.
> > > > 
> > > > Notes will probably also not decrease the number of reports too much because in most cases there will be just one uninitialized field. Though i agree that when there is more than one report, the user will be likely to deal with all fields at once rather than separately.
> > > > 
> > > > So it's a bit wonky.
> > > > 
> > > > We might want to enable one mode or another in the checker depending on the analyzer output flag. Probably in the driver (`RenderAnalyzerOptions()`).
> > > > 
> > > > It'd be a good idea to land the checker into alpha first and fix this in a follow-up patch because this review is already overweight.
> > > > [...]i guess it'd be necessary to at least provide an option to make note-less reports before the checker is enabled by default everywhere. In this case notes contain critical pieces of information and as such cannot be really discarded.
> > > This can already be achieved with `-analyzer-config notes-as-events=true`. There no good reason however not to make an additional flag that would emit warnings instead of notes.
> > > > I'm not sure that notes are providing a better user experience here than simply saying that "field x.y->z is never initialized". With notes, the user will have to scroll around (at least in scan-build) to find which field is uninitialized.
> > > This checker had a previous implementation that emitted a warning for each uninitialized field, instead of squeezing them in a single warning per constructor call. One of the major reasons behind rewriting it was that it emitted so many of them that it was difficult to differentiate which warning belonged to which constructor call. Also, in the case of the LLVM/Clang project, the warnings from this checker alone would be in the thousands.
> > > > Notes will probably also not decrease the number of reports too much because in most cases there will be just one uninitialized field.
> > > While this is true to some extent, it's not uncommon that a single constructor call leaves 7 uninitialized -- in the LLVM/Clang project I found one that had over 30!
> > > Since its practically impossible to determine whether this was a performance enhancement or just poor programming, the few cases where it is intentional, an object would emit many more warnings would make  make majority of the findings (where an object truly had only 1 or 2 fields uninit) a lot harder to spot in my opinion.
> > > 
> > > In general, I think one warning per constructor call makes the best user experience, but a temporary solution should be implemented until there's better support for notes.
> > > 
> > > I agree, this should be a topic of a follow-up patch.
> > > 
> > > This can already be achieved with `-analyzer-config notes-as-events=true`.
> > 
> > Yep. But the resulting user experience seems pretty bad to me. It was a good quick proof-of-concept trick to test things, but the fact that notes aren't path pieces is way too apparent in case of this checker. So i guess this flag was a bad idea.
> > 
> > > Also, in the case of the LLVM/Clang project, the warnings from this checker alone would be in the thousands.
> > 
> > This makes me a bit worried, i wonder what's the reason why does the checker shout so loudly on a codebase that doesn't seem to have actual uninitialized value bugs all over the place.
> > 
> > Are any of these duplicates found in the same constructor for the same field in different translation units? Suppose we discard the duplicates (if any) and warn exactly once (across the whole LLVM) for every uninitialized field (which is probably more than once per constructor). Then I wonder:
> > 1. How many uninitialize fields would we be able to find this way?
> > 2. How many of such fields were intentionally left uninitialized?
> > 3. How many were found by deep inspection of the heap through field chains involving pointer dereference "links"?
> (when i'm asking this sort of stuff, i don't mean you should look through thousands of positives, a uniform random sample of ~50 should be just fine)
> (but we really do need this info before enabling stuff by default)
>>This can already be achieved with -analyzer-config notes-as-events=true.
>Yep. But the resulting user experience seems pretty bad to me. It was a good quick proof-of-concept trick to test things, but the fact that notes aren't path pieces is way too apparent in case of this checker. So i guess this flag was a bad idea.

I totally agree, I meant that a note-less solution should only be a (arguably better) workaround just as `notes-as-events=true`.

>Are any of these duplicates found in the same constructor for the same field in different translation units? Suppose we discard the duplicates (if any) and warn exactly once (across the whole LLVM) for every uninitialized field (which is probably more than once per constructor).

Sure, having the same field field reported from the same constructor call in different TUs happens quite a lot, but they can easily be uniqued.

In the checkers current state, meaning that one warning per ctor call, the LLVM/Clang project in pedantic mode resulted in **409 non-unique warnings**, and using CodeChecker I found that **181 of these is unique**.

>How many uninitialize fields would we be able to find this way?
In its current state, in pedantic mode, after uniqueing **~581**. Now that's quite a bit off from what I remembered //"the warnings from this checker alone would be in the thousands"//, but having one warning per uninit field would still result in a ~320% increase in reports.
>How many of such fields were intentionally left uninitialized?
(I checked around 95 reports a couple weeks back before uploading this diff)
**Almost all of them.** Note that this is a very performance-critical project. From what I saw, many times several fields are left uninitialized, but these fields are inaccessible due to a `kind` field being asserted at their getter functions.
Also, I found cases where the constructor wasn't defaulted with `= default;`.

In fact, I struggled to find a case where a field was 100% left out by accident. Maybe in `llvm/lib/IR/ConstantsContext.h`, calling `ConstantExprKeyType(ArrayRef<Constant *> Operands, const ConstantExpr *CE)`.

This is not the case however in other projects I checked. In Xerces for example every find was a true positive.

>How many were found by deep inspection of the heap through field chains involving pointer dereference "links"?
This is somewhat harder to measure. The best I could come up with is uniqueing the fieldchains from the plist files with lexicographical sorting. Your question can be answered by measuring  the length of the fieldchains. I found
* 3 fieldchains with the length of 5
* 18 fieldchains with the length of 4
* 62 fieldchains with the lenfth of 3
* 108 fieldchains with the length of 2
* 137 fieldchains with the length of 1
(that is a total of 328 unique fieldchains lexicographically)
Because heap objects are not modeled, only 3 of the 328 fieldchains contains pointer dereferences. However, reference objects are very common from what I saw, but sadly I can't give you an exact number on those.

I hope these answers were satisfying, but I'll follow up with more information if needed.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:359-372
+  // Checking bases.
+  const auto *CXXRD = dyn_cast<CXXRecordDecl>(RD);
+  if (!CXXRD)
+    return ContainsUninitField;
+
+  for (const CXXBaseSpecifier BaseSpec : CXXRD->bases()) {
+    const auto *Base = BaseSpec.getType()->getAsCXXRecordDecl();
----------------
NoQ wrote:
> Are there many warnings that will be caused by this but won't be caused by conducting the check for the constructor-within-constructor that's currently disabled? Not sure - is it even syntactically possible to not initialize the base class?
I'm not a 100% sure what you mean. Can you clarify?
>Not sure - is it even syntactically possible to not initialize the base class?
If I understand the question correctly, no, as far as I know.


================
Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:430
+    if (T->isUnionType()) {
+      // TODO: does control ever reach here?
+      if (isUnionUninit(R)) {
----------------
NoQ wrote:
> Add `llvm_unreachable("")` to find out :)
It does! :)


https://reviews.llvm.org/D45532





More information about the cfe-commits mailing list