[cfe-dev] [StaticAnalyzer] C++ related checkers

Gábor Horváth xazax.hun at gmail.com
Fri Mar 20 15:33:12 PDT 2015


Hi Adam!

On 20 March 2015 at 22:34, Adam Romanek <romanek.adam at gmail.com> wrote:

> Hi Gábor!
>
> Thanks for the information. The work you've done might in fact help me to
> push the C++ related checkers further. I'd like to investigate it a bit.
>
> Is there any summary of what has been done and what is still missing? Are
> there any code examples?
>

There is some inline documentation in the code and there are some tests
that can be used as code examples. The corresponding commit can be found
here: http://reviews.llvm.org/rL216550 The commit message contains some
documentation as well as some notes.


>
> BTW, how would I know which parts of C++ standard library might need to be
> synthesized through BodyFarm?
>


Depending on your needs. Usually models are for function bodies that are
not available in a translation unit. When a function body is available the
model file for that function will not be used. It would not be too hard to
make it possible to implement "strong" models that take precedence over the
original implementation.


>
> I'm just trying to understand the amount of work required to use this
> approach for a basic checker I mentioned at the beginning. Any further
> hints would be useful.
>

This "textual bodyfarm" only works on limited examples right now (c
functions). It would not be too hard to add support for methods, overloaded
functions as well. Supporting templates would be challenging, however most
of the time it would be irrelevant for weak models.

There are also some bugs related to macros that I did not have the chance
yet to sort out.

The handling of multiple model files are also not implemented yet. The plan
was to handle them similar to how headers are handled (model search paths
instead of header search paths).

If you think these "textual models" would be useful for you and you have
some requirements/expectations/assumptions feel free to share with me. If
you plan to improve anything regarding this feature and need some help or
guidance feel free to ping me.


>
> Thanks!
> Adam Romanek
>
> On 19.03.2015 09:39, Gábor Horváth wrote:
>
>> Hi Jared!
>>
>> You might be interested in this GSoC project from last year:
>> http://www.google-melange.com/gsoc/project/details/google/gsoc2014/xazax/
>> 5717271485874176
>>
>> It makes it possible to wrote C++ code for the bodyfarm instead of
>> assembling the AST manually. It works for simple cases and available in
>> the trunk already. Unfortunately there is a lot of work left to do which
>> I plan to solve, but I lack the time for that at the moment.
>>
>> Cheers,
>> Gábor
>>
>>
>>
>> On 19 March 2015 at 06:00, Jared Grubb <jared.grubb at gmail.com
>> <mailto:jared.grubb at gmail.com>> wrote:
>>
>>
>>      On Mar 16, 2015, at 15:18, Adam Romanek <romanek.adam at gmail.com
>>>     <mailto:romanek.adam at gmail.com>> wrote:
>>>
>>>     Hi!
>>>
>>>     I'm new to this list and to Clang development. Nevertheless I've
>>>     been interested in Clang Static Analyzer for a while. I've been
>>>     using it on a large code base with a lot of success. So let me
>>>     start by saying: thanks for this amazing piece of code!
>>>
>>>     But... Some time ago I realized there are hardly any strictly C++
>>>     related checkers in CSA. I was wondering if there's any movement
>>>     in this area. I was thinking about some checkers for
>>>     use-after-free for STL containers like std::string, for example:
>>>
>>>     const char* x = NULL;
>>>     {
>>>       std::string foo("foo");
>>>       x = foo.c_str();
>>>     }
>>>     printf("%s", x); // boom
>>>
>>>     There are also some other common types of errors in C++ like use
>>>     of iterator after it has been invalidated. FYI this one in
>>>     particular is detected by cppcheck.
>>>
>>>     So I decided to dig a bit to find out whether it is hard to write
>>>     a checker for use-after-free like in the example with std::string.
>>>     It looks like MallocChecker deals with a similar class of issues.
>>>
>>>     I was wondering whether it would be the right approach to try to
>>>     "bend" MallocChecker to my needs (but it's already 2.5k lines of
>>>     code) or to start something new on my own.
>>>
>>>     Honestly it took me some time even to detect a simple std::string
>>>     constructor call so the road looks rather long and bumpy...
>>>
>>>     Any hints, pointers? Any related work?
>>>
>>
>>     I have looked at this in the past, but it was about 18 months ago.
>>     So take my thoughts with that grain of salt. Also note that I’m not
>>     a regular or major contributor here. I’ve done very minor patches,
>>     but always hoping to do more :) So here’s my thoughts, and take them
>>     as you will.
>>
>>     The MallocChecker is fine, but the problem is that libc++ is really
>>     hard to analyze. It is an efficient implementation, but that
>>     cleverness really stresses the analyzer. For example, std::string’s
>>     memory layout is a union of three different types (“long”, “short”,
>>     “raw” buffers). I think the SA gives up on unions immediately.
>>
>>     The best way around this is to simplify what the analyzer sees. Here
>>     are two approaches.
>>
>>     One idea is to use “BodyFarm”, whose role is to synthesize alternate
>>     implementations for functions that should be simple to model. If you
>>     look here, you’ll see a bit about that:
>>     http://clang-analyzer.llvm.org/open_projects.html
>>
>>     Another idea is to actually implement a “simple libc++” and
>>     interpose that for analysis. For example, std::basic_string class
>>     would just be a pointer and two size_t’s, along with simple
>>     implementations of all the member functions and simple iterators. In
>>     the future, you could add other analysis hooks (for example, check
>>     for iterator invalidation).
>>
>>     I did play around a bit on this for Body Farm, and I can forward you
>>     the code I did. I got a couple constructors implemented, as well as
>>     “empty()” and “size()” for some very basic cases (string literal
>>     initialized strings). However, it got a bit tedious and I’m not sure
>>     it would scale. I think the second approach is far more interesting
>>     and maintainable. But a “simple libc++” could be hard for its own
>>     reasons.
>>
>>     Anyway I’m happy to give you my sketches. I’ll email them off-list.
>>     Take them or ignore them however you like.
>>
>>
>>
>>>     Thanks in advance.
>>>
>>>     Best regards,
>>>     Adam Romanek
>>>     _______________________________________________
>>>     cfe-dev mailing list
>>>     cfe-dev at cs.uiuc.edu <mailto:cfe-dev at cs.uiuc.edu>
>>>     http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>
>>
>>
>>     _______________________________________________
>>     cfe-dev mailing list
>>     cfe-dev at cs.uiuc.edu <mailto:cfe-dev at cs.uiuc.edu>
>>     http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>
>>
>>
>
Cheers,
Gábor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150320/2ec075f5/attachment.html>


More information about the cfe-dev mailing list