[cfe-commits] [patch] Warn in string literal + integral

Richard Smith richard at metafoo.co.uk
Fri Mar 2 13:42:31 PST 2012


On Fri, Mar 2, 2012 at 9:53 AM, Nico Weber <thakis at chromium.org> wrote:

> On Thu, Mar 1, 2012 at 11:12 PM, Richard Smith <richard at metafoo.co.uk>
> wrote:
> > On Thu, Mar 1, 2012 at 9:26 PM, Nico Weber <thakis at chromium.org> wrote:
> >>
> >> >> > On Tue, Feb 21, 2012 at 7:17 PM, Nico Weber <thakis at chromium.org>
> >> >> > wrote:
> >> >> >>
> >> >> >> the attached patch adds -Wstring-plus-int (I'm open to more
> creative
> >> >> >> names :-) ), which warns on code like |"error code: " + err|. The
> >> >> >> warning found 4 bugs in 2 files in chromium. The bugs it found
> look
> >> >> >> like this:
> >> >> >>
> >> >> >> File 1:
> >> >> >>    if (!read_node.InitByIdLookup(it->id)) {
> >> >> >>      error_handler()->OnUnrecoverableError(FROM_HERE, "Failed to
> >> >> >> look
> >> >> >> up "
> >> >> >>          " data for received change with id " + it->id);
> >> >> >>      return;
> >> >> >>    }
> >> >> >>
> >> >> >> File 2:
> >> >> >>    ResetStream(stream_id, spdy::INVALID_ASSOCIATED_STREAM,
> >> >> >>      "Received OnSyn with inactive associated stream " +
> >> >> >>      associated_stream_id);
> >> >> >> (Fix:
> >> >> >>
> >> >> >>
> http://codereview.chromium.org/9372076/diff/1/net/spdy/spdy_session.cc
> >> >> >> )
> >> >> >>
> >> >> >> (A coworker found the bug in the one file, which prompted me
> writing
> >> >> >> the warning. The warning then found the bug in the other file.)
> >> >> >>
> >> >> >>
> >> >> >> When building all of chromium and its many dependencies, the
> warning
> >> >> >> did also find 3 false positives, but they all look alike: Ffmepg
> >> >> >> contains three different functions that all look like this:
> >> >> >
> >> >> >
> >> >> > 4 bugs and 3 false positives doesn't sound great to me.
> >> >>
> >> >> True. All of the the 3 false positives are really the same snippet
> >> >> though. (On the other hand, 3 of the 4 bugs are very similar as
> well.)
> >> >> The bugs this did find are crash bugs, and they were in rarely run
> >> >> error-logging code, so it does find interesting bugs. Maybe you guys
> >> >> can run it on your codebase after it's in, and we can move it out of
> >> >> -Wmost if you consider it too noisy in practice?
> >> >
> >> >
> >> > Sounds reasonable.
> >> >
> >> >>
> >> >> > Have you considered
> >> >> > relaxing the warning in cases where the integral summand is a
> >> >> > constant
> >> >> > expression and is in-bounds? Would this pattern have matched any of
> >> >> > your
> >> >> > real bugs?
> >> >>
> >> >> That's a good idea (for my bugs, the int was always a declrefexpr,
> >> >> never a literal). The ffmpeg example roughly looks like this:
> >> >>
> >> >>  #define A "foo"
> >> >>  #define B "bar"
> >> >>  consume(A B + sizeof(A) - 1);
> >> >>
> >> >> The RHS is just "sizeof(A)" without the "- 1", but since A B has a
> >> >> length of 6, this still makes this warning go away. I added this to
> >> >> the patch. With this change, it's 4 bugs / 0 false positives.
> >> >>
> >> >> Note that this suppresses the warning for most enums which start at
> 0.
> >> >> Or did you mean to do this only for non-enum constant expressions?
> >> >
> >> >
> >> > I could imagine code legitimately wanting to do something like this:
> >> >
> >> > template</*...*/>
> >> > struct S {
> >> >   enum { Offset = /* ... */ };
> >> >   static const char *str = "some string" + Offset;
> >> > };
> >> >
> >> > That said, I'm happy for us to warn on such code. Whatever you prefer
> >> > seems
> >> > fine to me; we can refine this later, if a need arises.
> >>
> >> Enums are handled like ints: They warn if the offset is out of bounds,
> >> but don't if it's they are in bounds.
> >>
> >> >> > Onto the patch...
> >> >> >
> >> >> > Index: include/clang/Basic/DiagnosticGroups.td
> >> >> > ===================================================================
> >> >> > --- include/clang/Basic/DiagnosticGroups.td (revision 150418)
> >> >> > +++ include/clang/Basic/DiagnosticGroups.td (working copy)
> >> >> > @@ -320,6 +321,7 @@
> >> >> >      ReturnType,
> >> >> >      SelfAssignment,
> >> >> >      SizeofArrayArgument,
> >> >> > +    StringPlusInt,
> >> >> >      Trigraphs,
> >> >> >      Uninitialized,
> >> >> >      UnknownPragmas,
> >> >> >
> >> >> > Given the current level of false positives, I'm not completely
> >> >> > convinced
> >> >> > this should go into -Wmost. I imagine we'll know more once we've
> run
> >> >> > this on
> >> >> > more code.
> >> >> >
> >> >> > Index: include/clang/Basic/DiagnosticSemaKinds.td
> >> >> > ===================================================================
> >> >> > --- include/clang/Basic/DiagnosticSemaKinds.td (revision 150418)
> >> >> > +++ include/clang/Basic/DiagnosticSemaKinds.td (working copy)
> >> >> > @@ -3414,6 +3414,12 @@
> >> >> >    "explicitly assigning a variable of type %0 to itself">,
> >> >> >    InGroup<SelfAssignment>, DefaultIgnore;
> >> >> >
> >> >> > +def warn_string_plus_int : Warning<
> >> >> > +  "Adding an integral to a string does not append to the string">,
> >> >> >
> >> >> > Diagnostics outside the static analyzer should not be capitalized.
> >> >>
> >> >> Done.
> >> >>
> >> >> > +  InGroup<StringPlusInt>;
> >> >> > +def note_string_plus_int_silence : Note<
> >> >> > +  "use &(str[index]) if pointer arithmetic is intended">;
> >> >> >
> >> >> > The parentheses here are redundant. I think I'd prefer for the
> >> >> > suggestion to
> >> >> > be provided via fixits (and to reword this as something like "use
> >> >> > array
> >> >> > indexing to silence this warning").
> >> >>
> >> >> Clang tends to suggest very explicit parentheses (see
> >> >> -Wparentheses-logical-op for example).
> >> >
> >> >
> >> > Usually only in cases where there is a potential bug. If we saw
> >> > &"foo"[2],
> >> > we wouldn't be likely to suspect that the user might have really meant
> >> > (&"foo")[2], so my (weak) preference is to suggest the simpler fix.
> >>
> >> Removed the parens.
> >>
> >> > (Incidentally, -Warray-bounds should be taught to complain about
> >> > (&"foo")[2]!)
> >>
> >> Patches accepted! :-D
> >>
> >> >> I did change it to a fixit and
> >> >> reworded, it now looks like so:
> >> >>
> >> >> test/SemaCXX/string-plus-int.cpp:12:17: warning: adding an integral
> to
> >> >> a string does not append to the string [-Wstring-plus-int]
> >> >>  consume("foo" + 5);
> >> >>          ~~~~~~^~~
> >> >> test/SemaCXX/string-plus-int.cpp:12:17: note: use array indexing to
> >> >> silence this warning
> >> >>  consume("foo" + 5);
> >> >>                ^
> >> >>          &(    )[ ]
> >> >
> >> >
> >> > This looks nice ... though if we're going to have parens in the fixit,
> >> > they
> >> > should contain the [] :-)
> >> >
> >> >>
> >> >> > Index: lib/Sema/SemaExpr.cpp
> >> >> > ===================================================================
> >> >> > --- lib/Sema/SemaExpr.cpp (revision 150418)
> >> >> > +++ lib/Sema/SemaExpr.cpp (working copy)
> >> >> > @@ -7798,6 +7798,20 @@
> >> >> >      ParensRange);
> >> >> >  }
> >> >> >
> >> >> > +/// DiagnoseStringPlusInt - Emit a warning when adding an integer
> to
> >> >> > a
> >> >> > string
> >> >> > +/// literal.
> >> >> > +static void DiagnoseStringPlusInt(Sema &Self, SourceLocation
> OpLoc,
> >> >> > +                                  Expr *LHSExpr, Expr *RHSExpr) {
> >> >> > +  bool IsStringPlusInt = dyn_cast<StringLiteral>(LHSExpr) &&
> >> >> > +      RHSExpr->getType()->isIntegralType(Self.getASTContext());
> >> >> >
> >> >> > We should probably also check for (unscoped) enumeration types, and
> >> >> > for
> >> >> > class types with implicit conversion to int. Doing so will require
> >> >> > moving
> >> >> > this check to checkArithmeticOpPointerOperand (after we've checked
> >> >> > for
> >> >> > an
> >> >> > overloaded operator+).
> >> >>
> >> >> Added a check for (unscoped) enumeration types. class types with
> >> >> implicit conversion to int feel like overkill to me.
> >> >
> >> >
> >> > OK. You should still move the check to after we check for an
> overloaded
> >> > operator+, since operator+ can be overloaded for enums. (Once that's
> >> > done,
> >> > you may find it's easier to warn on such implicitly-convertible types
> >> > than
> >> > to not do so.)
> >>
> >> Done, added an enum-overloaded operator+ to the test.
> >>
> >> >> > +  if (!IsStringPlusInt)
> >> >> > +    return;
> >> >> > +
> >> >> > +  SourceRange DiagRange(LHSExpr->getLocStart(),
> >> >> > RHSExpr->getLocEnd());
> >> >> > +  Self.Diag(OpLoc, diag::warn_string_plus_int) << DiagRange;
> >> >> > +  Self.Diag(OpLoc, diag::note_string_plus_int_silence);
> >> >> > +}
> >> >> >
> >> >> > We should also catch the case where the integer is on the LHS (and
> >> >> > skip
> >> >> > the
> >> >> > fixit in that case).
> >> >>
> >> >> I tried that. It found neither additional bugs nor additional false
> >> >> positives, so it's probably not worth it?
> >> >
> >> >
> >> > Fine, there's no point adding this if it doesn't actually catch any
> >> > bugs. I
> >> > have only empirical evidence that it might -- I once saw[*] the moral
> >> > equivalent of:
> >> >
> >> >   int n = /*...*/;
> >> >   std::string s = n + " widgets found";
> >> >
> >> > [*] a long time ago, in a codebase far away...
> >>
> >> Ok, I added the warning for the int on the left as well (without fixit
> >> in that case, like you suggested upthread).
> >
> >
> > Just a few more things:
> >
> > +def warn_string_plus_int : Warning<
> > +  "adding an integral to a string does not append to the string">,
> > +  InGroup<StringPlusInt>;
> >
> > Rather than "an integral", can we say something like:
> >
> >   adding %0 to a string literal does not append to the string
> >
> > (where %0 is the type)?
>
> Done.
>
> > +  if (IndexExpr->EvaluateAsInt(index, Self.getASTContext())) {
> > +    if (index.isNonNegative() &&
> > +        index < llvm::APSInt(llvm::APInt(index.getBitWidth(),
> > +                                         StrExpr->getByteLength()),
> >
> > This should be <=, not <, I think: we want to allow "foo"+4 (a pointer
> past
> > the null) but warn on "foo"+5 (which has undefined behavior). And
> > getByteLength() should be getLength(), to properly handle wide string
> > literals. We'll get duplicate warnings for the same issue if
> > -Warray-bounds-pointer-arithmetic is enabled and we carry on here, but
> such
> > code is rare enough that I don't think that's a pressing concern.
>
> Done (-ish: getLength() didn't include the trailing \0, so I'm doing
> <= getLength() + 1), added a test.


Thanks, this looks fine to commit.


> > Other than that, a test for the fix-it would be great (see test/FixIt for
> > examples).
>
> Wouldn't that require the fixit to be on the warning instead of the
> note? (If so, maybe there's no need for the test yet?)
>

You can test fixits on notes with -fdiagnostics-parseable-fixits +
FileCheck. See test/FixIt/fixit-vexing-parse.cpp for an example of that.
I'm happy for that to be a subsequent commit.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120302/cfb962ba/attachment.html>


More information about the cfe-commits mailing list