[PATCH] [StaticAnalyzer] New checker Sizeof on expression

Wed Oct 23 16:58:19 PDT 2013

On Wed, Oct 23, 2013 at 8:35 AM, Anna Zaks <ganna at apple.com> wrote:

> Daniel,
>
> I think Richard was asking about false positives/true positives ratio and
> I do not see any false positives in the list below. It looks like all the
> reports you found are valid. Did you run this check on large C++ codebases?
>
> As far as I can tell there are two outstanding issues:
>  - Addressing these questions from Richard:
>
> sizeof(expression) is a common idiom in SFINAE contexts. Is that covered
> here?
>
> Richard, can you provide examples?
>

There's one here:

  http://en.wikipedia.org/wiki/Substitution_failure_is_not_an_error

... but that's not the case I was thinking of; it's not likely that this
pattern will be used with sizeof(binary operator).

The case I meant is more like the one here:

  http://stackoverflow.com/questions/12654067/what-is-expression-sfinae

... where 'sizeof' is just used to hide an arbitrary expression inside a
function type. Here's another example using a binary operator inside a
sizeof expression:

http://stackoverflow.com/questions/2127693/sfinae-sizeof-detect-if-expression-compiles

IIRC, sizeof(expression) was the *only* way to do this kind of thing in
C++98.

> sizeof(sizeof(int)) is a "traditional" way to get sizeof(size_t). Why
> should we warn on that?
>
> (I personally think that if this check goes into the analyzer, it's fine
> to warn in the second case.)
>

Sure. I think it's even fine for a compiler warning, since sizeof(size_t)
is a clearer way to get the same result.

 - Where this check should go (compiler or the analyzer)?
> I think that if you can address the questions above, this check could go
> into the compiler.
>
> Cheers,
> Anna.
>
> On Oct 14, 2013, at 12:43 PM, Daniel Marjamäki <
> Daniel.Marjamaki at evidente.se> wrote:
>
>
> Hello!
>
> As I understood it.. you wanted to know what the hit ratio is. I have
> investigated the hit ratio on a number of packages in debian. Below is the
> negative and positives I've found. I provide the package URLs so you can
> look for yourself..
>
> I will continue investigating and let you know if I see any false
> positives...
>
>
> NEGATIVE (macro)
> package:
> http://ftp.us.debian.org/debian/pool/main/a/agedu/agedu_9723.orig.tar.gz
>
> #define sresize(array, number, type) \
>     ( (void)sizeof((array)-(type *)0), \
>       (type *) srealloc ((array), (number) * sizeof (type)) )
>
> We don't warn about this since it's in a macro. However it is interesting:
>     sizeof((array)-(type *)0)
> It seems to be written this way by intention. To get sizeof(ptrdiff_t)?
> Maybe sizeof(ptr-ptr) should be ok.
>
>
>
> POSITIVE
> package:
> http://ftp.us.debian.org/debian/pool/main/a/argus-client_3.0.0.orig.tar.gz
>
> file: argus-clients-3.0.0/radump/print-ldp.c
> line: 607
>             print_unknown_data(tptr+sizeof(sizeof(struct
> ldp_msg_header)),"\n\t  ",
>                                msg_len);
>
> file: argus-clients-3.0.0/radump/print-lmp.c
> line: 878
>             print_unknown_data(tptr+sizeof(sizeof(struct
> lmp_object_header)),"\n\t    ",
>                                lmp_obj_len-sizeof(struct
> lmp_object_header));
>
> Not sure if these are intended to be sizeof(size_t) or sizeof(struct
> lmp_object_header) but I guess the latter.
>
>
>
> POSITIVE
> package:
> http://ftp.us.debian.org/debian/pool/main/b/babel/babel_1.4.0.dfsg.orig.tar.gz
> file: babel-1.4.0.dfsg/regression/output/libC/synch_RegOut_Impl.c
>
> 332:    if ((res < 0) || (res >
> sizeof(s_result_strings)/sizeof(sizeof(char *)))) {
>
> This "sizeof(sizeof(char *))" should be "sizeof(char*)" since
> s_result_strings is a array of char * pointers
>
>
>
> POSITIVE
> package:
> http://ftp.us.debian.org/debian/pool/main/b/bird/bird_1.3.7.orig.tar.gz
> file: bird-1.3.7/lib/mempool.c
>
> 253:
>   return ALLOC_OVERHEAD + sizeof(struct linpool) +
>     cnt * (ALLOC_OVERHEAD + sizeof(sizeof(struct lp_chunk))) +
>     m->total + m->total_large;
>
> Not sure if this is intended to be sizeof(size_t) or sizeof(struct
> lp_chunk) but I guess the latter.
>
>
> POSITIVE:
> package:
> http://ftp.us.debian.org/debian/pool/main/b/blender/blender_2.49.2~dfsg.orig.tar.gz
> file: blender-2.49b.orig/source/blender/blenloader/intern/readfile.c
> 6987:    if(0==strncmp(ima->id.name+2, "Viewer Node", sizeof(ima->id.name+2)))
> {
> 6991:    if(0==strncmp(ima->id.name+2, "Render Result", sizeof(ima->
> id.name+2))) {
>
> These are bad, ima->id.name is an array, it should probably be changed to:
>     if(0==strncmp(ima->id.name+2, "Viewer Node", sizeof(ima->id.name) -
> 2)) {
>
>
> Best regards,
> Daniel Marjamäki
>
>
> ..................................................................................................................
> Daniel Marjamäki Senior Engineer
> Evidente ES East AB  Warfvinges väg 34  SE-112 51 Stockholm  Sweden
>
>
> Mobile:                 +46 (0)709 12 42 62
> E-mail:                 Daniel.Marjamaki <Daniel.Marjamaki at evidente.se>
> @evidente.se <Daniel.Marjamaki at evidente.se>
>
>
> www.evidente.se
> ------------------------------
> *Från:* cfe-commits-bounces at cs.uiuc.edu [cfe-commits-bounces at cs.uiuc.edu]
> för Anna Zaks [ganna at apple.com]
> *Skickat:* den 8 oktober 2013 18:18
> *Till:* Ted Kremenek
> *Cc:* Anders Rönnholm; Richard Smith; cfe-commits at cs.uiuc.edu
> *Ämne:* Re: [PATCH] [StaticAnalyzer] New checker Sizeof on expression
>
>
> On Oct 7, 2013, at 10:05 PM, Ted Kremenek <kremenek at apple.com> wrote:
>
> On Oct 7, 2013, at 6:05 PM, Jordan Rose <jordan_rose at apple.com> wrote:
>
>
> On Oct 7, 2013, at 13:58, Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Mon, Oct 7, 2013 at 9:55 AM, Jordan Rose <jordan_rose at apple.com> wrote:
>
>> I'm fine with this staying in the analyzer for now, unless David,
>> Richard, or Eli feel it should be a warning right away.
>>
>
> Do we have evidence that we want this? Does it catch bugs? If so, what do
> they look like? It seems like this would trigger on legitimate code; how
> does a user suppress the warning in that case, and does that suppression
> make their code clearer?
>
> What is the false/true positive ratio for bug finding here?
>
> sizeof(expression) is a common idiom in SFINAE contexts. Is that covered
> here?
>
> sizeof(sizeof(int)) is a "traditional" way to get sizeof(size_t). Why
> should we warn on that?
>
> And more as a general question than something specific to this patch: Is
> there a region in the space of false positive ratios where we think a
> syntactic warning should go into the static analyzer? If so, why? And what
> is that region? I would have thought that the static analyzer, like the
> clang warnings, would be aimed at (eventually) having a false positive
> ratio of near zero. If so, then should we ever put a warning in the static
> analyzer if it doesn't require the static analyzer's technology (or have a
> high runtime cost)?
>
>
> On this last (and bringing in Ted and Anna):
>
> I think the main difference between compiler warnings and syntactic
> analyzer checks is that we try very hard to turn new compiler warnings on
> by default. A second-order effect of this is that we generally avoid style
> warnings. The analyzer can be a bit looser about this, though: because
> people *know* the analyzer is stricter and more in-depth, I think they
> might also accept that a particular check doesn't fit their project.
>
> On the other hand, we still haven't gotten around to designing a proper
> bug tracking and/or manual suppression system, so that's one advantage of
> compiler warnings. And as you say, checks without a high runtime cost don't
> really have a *technical* reason to be in the analyzer.
>
>
> Richard’s point is correct that we want the static analyzer to also have a
> high signal-to-noise ratio.  Otherwise it is a useless tool.  I’m also not
> a fan of having the analyzer having a bunch of “junk” checkers that aren’t
> on by default, but if a checker, when it is enabled, is HIGHLY useful to
> some set of people (e.g., security experts who are more tolerant of false
> positive rates if they want to do an aggressive code audit)  I think the
> analyzer is a reasonable place to put them, given the current warning
> policy in Clang where we want the warnings there to be generally useful to
> everybody.
>
> To put it in more context, in the beginning the guiding principal of what
> goes in the static analyzer was:
>
> (a) The warning is very expensive to compute.
>
>  OR
>
> (b) The warning is very domain-specific.  For example, an API such as
> CFNumberCreate() on OS X has some interesting API invariants, but we
> generally should not be hacking those API-specific warnings into the
> compiler.  Exceptions exist of course, e.g., printf checking, but often the
> are grounded when such APIs are fairly standard (e.g., in the C standard
> itself) or the checking is based on some annotation like
> __attribute__((format)) where the compiler doesn’t know anything about a
> specific function itself, just the annotation.
>
> Style warnings can sometimes fall into (a) (in which case not putting them
> in the compiler makes obvious sense), but one could argue that they are
> more the flavor of (b) then a traditional compiler warning.  As I mentioned
> earlier, the static analyzer can be home to some highly specialized
> checkers that may not be generally useful for everybody but when enabled
> are very useful to certain people.  Style warnings often fit in this
> category of warnings.
>
> A related problem is that we don’t have an ontology for style warnings in
> clang (the compiler).  Should they really be lumped into the same group of
> all other compiler warnings?  What about their behavior with -Werror?
>  -Weverything?  Style warnings really are about personal style, and users
> can be highly polarized about them.  For example, it would be highly useful
> for Clang to have a -W80-columns or -Wspaces-instead-of-tabs (not a serious
> proposal).  Both would be cheap to compute, would obviously benefit LLVM
> developers, but they wouldn’t work for everybody, or even the majority of
> software developers.  Where should they go?  The compiler?  The static
> analyzer?  Both have discoverability and usability concerns in either
> location.  I’d argue that these two warnings would be better suited in the
> compiler because (when the user wants them) they’d get run all the time,
> but having these warnings counteracts the general guiding principal that
> compiler warnings should generally be useful to most people, or clear
> categories of people (e.g, library authors), but not specific groups of
> people (e.g., LLVM developers).
>
> This is not a thought out proposal, but I would not be opposed to a new
> category of warnings, say -Wstyle, and have all style warnings under
> -Wstyle, perhaps named “-Wstyle-80-columns” or
> “-Wstyle-spaces-instead-of-tabs”.  None of these warnings would be on by
> default.  If we want these kind of warnings in the compiler, I think these
> kind of warnings are worth calling out as being something “different” that
> people shouldn’t get hung up about if they don’t think the warning applies
> to them.  Also, having them in a special category of them own allows
> institutions to set broad warning polices such as “-Werror
> -Wno-error=style” where they want regular warnings to be treated as error,
> but not style warnings, etc.
>
>
> I agree with Ted. Currently, there is no good way of adding style and
> highly specialized warnings to clang; given the behavior of Werror and
> Weverything. For example, we encourage users to learn about new generic
> warnings by turning on Weverything, so the specialized warnings should not
> be included there.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20131023/bd576ebb/attachment.html>