[cfe-commits] [patch] Warn in string literal + integral

Richard Smith richard at metafoo.co.uk
Thu Mar 1 00:24:39 PST 2012


On Mon, Feb 27, 2012 at 11:11 PM, Nico Weber <thakis at chromium.org> wrote:

> Thanks for the quick review, and sorry for the somewhat slow follow-up :-)
>
> On Tue, Feb 21, 2012 at 8:07 PM, Richard Smith <richard at metafoo.co.uk>
> wrote:
> > Hi Nico,
> >
> > On Tue, Feb 21, 2012 at 7:17 PM, Nico Weber <thakis at chromium.org> wrote:
> >>
> >> the attached patch adds -Wstring-plus-int (I'm open to more creative
> >> names :-) ), which warns on code like |"error code: " + err|. The
> >> warning found 4 bugs in 2 files in chromium. The bugs it found look
> >> like this:
> >>
> >> File 1:
> >>    if (!read_node.InitByIdLookup(it->id)) {
> >>      error_handler()->OnUnrecoverableError(FROM_HERE, "Failed to look
> up "
> >>          " data for received change with id " + it->id);
> >>      return;
> >>    }
> >>
> >> File 2:
> >>    ResetStream(stream_id, spdy::INVALID_ASSOCIATED_STREAM,
> >>      "Received OnSyn with inactive associated stream " +
> >>      associated_stream_id);
> >> (Fix:
> >> http://codereview.chromium.org/9372076/diff/1/net/spdy/spdy_session.cc)
> >>
> >> (A coworker found the bug in the one file, which prompted me writing
> >> the warning. The warning then found the bug in the other file.)
> >>
> >>
> >> When building all of chromium and its many dependencies, the warning
> >> did also find 3 false positives, but they all look alike: Ffmepg
> >> contains three different functions that all look like this:
> >
> >
> > 4 bugs and 3 false positives doesn't sound great to me.
>
> True. All of the the 3 false positives are really the same snippet
> though. (On the other hand, 3 of the 4 bugs are very similar as well.)
> The bugs this did find are crash bugs, and they were in rarely run
> error-logging code, so it does find interesting bugs. Maybe you guys
> can run it on your codebase after it's in, and we can move it out of
> -Wmost if you consider it too noisy in practice?


Sounds reasonable.


> > Have you considered
> > relaxing the warning in cases where the integral summand is a constant
> > expression and is in-bounds? Would this pattern have matched any of your
> > real bugs?
>
> That's a good idea (for my bugs, the int was always a declrefexpr,
> never a literal). The ffmpeg example roughly looks like this:
>
>  #define A "foo"
>  #define B "bar"
>  consume(A B + sizeof(A) - 1);
>
> The RHS is just "sizeof(A)" without the "- 1", but since A B has a
> length of 6, this still makes this warning go away. I added this to
> the patch. With this change, it's 4 bugs / 0 false positives.
>
> Note that this suppresses the warning for most enums which start at 0.
> Or did you mean to do this only for non-enum constant expressions?


I could imagine code legitimately wanting to do something like this:

template</*...*/>
struct S {
  enum { Offset = /* ... */ };
  static const char *str = "some string" + Offset;
};

That said, I'm happy for us to warn on such code. Whatever you prefer seems
fine to me; we can refine this later, if a need arises.


> > Onto the patch...
> >
> > Index: include/clang/Basic/DiagnosticGroups.td
> > ===================================================================
> > --- include/clang/Basic/DiagnosticGroups.td (revision 150418)
> > +++ include/clang/Basic/DiagnosticGroups.td (working copy)
> > @@ -320,6 +321,7 @@
> >      ReturnType,
> >      SelfAssignment,
> >      SizeofArrayArgument,
> > +    StringPlusInt,
> >      Trigraphs,
> >      Uninitialized,
> >      UnknownPragmas,
> >
> > Given the current level of false positives, I'm not completely convinced
> > this should go into -Wmost. I imagine we'll know more once we've run
> this on
> > more code.
> >
> > Index: include/clang/Basic/DiagnosticSemaKinds.td
> > ===================================================================
> > --- include/clang/Basic/DiagnosticSemaKinds.td (revision 150418)
> > +++ include/clang/Basic/DiagnosticSemaKinds.td (working copy)
> > @@ -3414,6 +3414,12 @@
> >    "explicitly assigning a variable of type %0 to itself">,
> >    InGroup<SelfAssignment>, DefaultIgnore;
> >
> > +def warn_string_plus_int : Warning<
> > +  "Adding an integral to a string does not append to the string">,
> >
> > Diagnostics outside the static analyzer should not be capitalized.
>
> Done.
>
> > +  InGroup<StringPlusInt>;
> > +def note_string_plus_int_silence : Note<
> > +  "use &(str[index]) if pointer arithmetic is intended">;
> >
> > The parentheses here are redundant. I think I'd prefer for the
> suggestion to
> > be provided via fixits (and to reword this as something like "use array
> > indexing to silence this warning").
>
> Clang tends to suggest very explicit parentheses (see
> -Wparentheses-logical-op for example).


Usually only in cases where there is a potential bug. If we saw &"foo"[2],
we wouldn't be likely to suspect that the user might have really meant
(&"foo")[2], so my (weak) preference is to suggest the simpler fix.

(Incidentally, -Warray-bounds should be taught to complain about
(&"foo")[2]!)


> I did change it to a fixit and
> reworded, it now looks like so:
>
> test/SemaCXX/string-plus-int.cpp:12:17: warning: adding an integral to
> a string does not append to the string [-Wstring-plus-int]
>  consume("foo" + 5);
>          ~~~~~~^~~
> test/SemaCXX/string-plus-int.cpp:12:17: note: use array indexing to
> silence this warning
>  consume("foo" + 5);
>                ^
>          &(    )[ ]


This looks nice ... though if we're going to have parens in the fixit, they
should contain the [] :-)


> > Index: lib/Sema/SemaExpr.cpp
> > ===================================================================
> > --- lib/Sema/SemaExpr.cpp (revision 150418)
> > +++ lib/Sema/SemaExpr.cpp (working copy)
> > @@ -7798,6 +7798,20 @@
> >      ParensRange);
> >  }
> >
> > +/// DiagnoseStringPlusInt - Emit a warning when adding an integer to a
> > string
> > +/// literal.
> > +static void DiagnoseStringPlusInt(Sema &Self, SourceLocation OpLoc,
> > +                                  Expr *LHSExpr, Expr *RHSExpr) {
> > +  bool IsStringPlusInt = dyn_cast<StringLiteral>(LHSExpr) &&
> > +      RHSExpr->getType()->isIntegralType(Self.getASTContext());
> >
> > We should probably also check for (unscoped) enumeration types, and for
> > class types with implicit conversion to int. Doing so will require moving
> > this check to checkArithmeticOpPointerOperand (after we've checked for an
> > overloaded operator+).
>
> Added a check for (unscoped) enumeration types. class types with
> implicit conversion to int feel like overkill to me.


OK. You should still move the check to after we check for an overloaded
operator+, since operator+ can be overloaded for enums. (Once that's done,
you may find it's easier to warn on such implicitly-convertible types than
to not do so.)

>
> > +  if (!IsStringPlusInt)
> > +    return;
> > +
> > +  SourceRange DiagRange(LHSExpr->getLocStart(), RHSExpr->getLocEnd());
> > +  Self.Diag(OpLoc, diag::warn_string_plus_int) << DiagRange;
> > +  Self.Diag(OpLoc, diag::note_string_plus_int_silence);
> > +}
> >
> > We should also catch the case where the integer is on the LHS (and skip
> the
> > fixit in that case).
>
> I tried that. It found neither additional bugs nor additional false
> positives, so it's probably not worth it?
>

Fine, there's no point adding this if it doesn't actually catch any bugs. I
have only empirical evidence that it might -- I once saw[*] the moral
equivalent of:

  int n = /*...*/;
  std::string s = n + " widgets found";

[*] a long time ago, in a codebase far away...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120301/631e5e12/attachment.html>


More information about the cfe-commits mailing list