[cfe-dev] [analyzer] MallocChecker & checker dependencies

Mon Mar 19 14:37:48 PDT 2018

On 17/03/2018 7:25 AM, Alexey Knyshev wrote:
>
>     This should already be supported by MallocChecker - i.e. it warns
>     when you allocate memory with, say, new() and release it with
>     free() or delete[]().
>
>     If methods of a smart pointer are "inlined" during analysis, these
>     bugs will already be caught automatically.
>
>
> I took a look at your recent work & status report posted there in 
> mailing lists. And I have a question about how it would work in 
> particular cases, e.g.:
> 1. Returning smart_ptr by value from function that has it's 
> implementation in analyzed translation unit but it isn't called from 
> any other function in the same TU.
>
> std::shared_ptr<S> createInstance() {
>    return std::shared_ptr<S>(new S, free);
> }
>
> If I understand correctly, compiler is free to apply copy-elison 
> optimization on shared_ptr which is returned by value. So destructor 
> call won't be modeled properly because it's not inlined during 
> analysis. And bug won't be caught.

Indeed, the tempting - but risky - thing to do here is to declare 
construction of "std::shared_ptr<S>(new S, free);" to be a bug on its 
own. It is risky because it leaves us with a checker that finds 
non-bugs; there is a chance that the destructor will never be called 
while the pointer is still being owned by the shared_ptr. Such code 
would be ridiculous but at the same time we wouldn't like to drive away 
users who had to write this to work around an equally ridiculous 
proprietary API that they have no way of changing.

In this particular case i'd have tried to take the risk, keeping in mind 
that i might end up rewriting the checker back to the conservative 
behavior of only warning at the destructor.

Note that copy elision is irrelevant, because in any case we won't 
free() the pointer at the return site, because reference count will 
never reach zero within the function. That's the whole point of the 
shared pointer.

> 2. As you mentioned before, report would be fired only if constructor 
> and destructor have been inlined (which is not guaranteed to be done). 
> As the result such check won't work in general case.

Yeah, that's pretty much the primary motivation for making an 
API-specific checker. It's not super strong in this case because this 
would likely give a very slight increase over the existing coverage.

In this case, however, the much more important motivation is to avoid 
false positives that occur when we don't know the original reference 
count of a shared pointer that was constructed before analysis has 
started - and we start "assuming..." that any destructor of a shared 
pointer could release memory. This is currently planned to be suppressed 
in a fairly gross manner (https://reviews.llvm.org/D44281). For this 
reason we believe that having explicit modeling for various smart 
pointers is a must.

> Actually my point is about modeling smart pointers behavior 
> specifically and don't inline any calls to constructor / destructor & 
> methods those modeled by checker. And the question immediately arises: 
> is there way to control inline policy?

It is indeed possible to make the checker "take over" function call 
modeling by subscribing on the `eval::Call` callback. In this case both 
inlining and conservative evaluation by the core are avoided, and the 
checker who evaluates the call is responsible for modeling *all* effects 
of the call (because two checkers cannot evalCall the same call, even 
though they can do the usual checkPreCall/checkPostCall thingy as much 
as they want).

There is also the BodyFarm mechanism (which allows you to mock 
simplified ASTs for standard functions) but you DO NOT WANT to use it 
for C++ because it'd take you forever to construct the correct C++ AST 
with templates manually.

> And sorry for any possible misunderstanding from my side as I'm not 
> quite familiar with CSA internals.
>
> Regards, Alexey K
>
> 2018-03-17 5:21 GMT+03:00 Artem Dergachev <noqnoqneo at gmail.com 
> <mailto:noqnoqneo at gmail.com>>:
>
>
>
>     On 16/03/2018 12:30 PM, Alexey Knyshev wrote:
>
>         Sorry, accidentally removed part of message before sending.
>
>         Aleksei,
>
>             Could you share your ideas of checker design? It is
>         possible that
>             problems you met can be solved in different (maybe even
>         better)
>             way if we know the whole picture.
>
>
>         I'm currently have no clear idea how to implement it in the
>         "right manner". But first of all I would aim to track the
>         origin (malloc, new, new [], etc) of SVal and check if Deleter
>         is the expected corresponding way to deallocate such SVal.
>
>
>     This should already be supported by MallocChecker - i.e. it warns
>     when you allocate memory with, say, new() and release it with
>     free() or delete[]().
>
>     If methods of a smart pointer are "inlined" during analysis, these
>     bugs will already be caught automatically.
>
>
>         Regards, Alexey K
>
>         2018-03-16 22:22 GMT+03:00 Alexey Knyshev
>         <alexey.knyshev at gmail.com <mailto:alexey.knyshev at gmail.com>
>         <mailto:alexey.knyshev at gmail.com
>         <mailto:alexey.knyshev at gmail.com>>>:
>
>             Hello guys,
>
>             And thanks for your attention
>
>             Aleksei,
>
>                 Could you share your ideas of checker design? It is
>         possible
>                 that problems you met can be solved in different
>         (maybe even
>                 better) way if we know the whole picture.
>
>
>             As I said, I'm interested in improving current state of
>         dynamic
>             memory modeling. Especially stuff related to smart
>         pointers (e.g.
>             shared, unique) which also mentioned in list of potential
>         checkers
>             as smartptr.SmartPtrInit
>             <https://clang-analyzer.llvm.org/potential_checkers.html
>         <https://clang-analyzer.llvm.org/potential_checkers.html>>*
>
>             *
>
>                 **You can see an example of how state trait is exported in
>                 GenericTaintChecker or MPIChecker. Generally, you just
>         create
>                 a named ProgramStateTrait in the header.
>
>
>             Briefly looked at MPITypes.h. Does it mean we should move
>             RegionState to separate file and register it via Traits in the
>             same manner to make it avaliable from other checkers (not
>         other TU
>             as mentioned in ento::mpi::RequestMap)?
>
>
>     GenericTaintChecker is made in a fairly intrusive manner by
>     putting stuff into the program state class directly, which isn't
>     very sane. I'd definitely prefer something similar to dynamic type
>     propagation (see DynamicTypeMap.cpp, DynamicTypeMap.h). You can
>     use the usual REGISTER_MAP_WITH_PROGRAMSTATE (etc.) macros in the
>     .cpp file as long as you provide accessor methods declared in the
>     .h file that other checkers could include.
>
>
>             Artem,
>
>                 you need to keep all of this in mind when writing a
>         checker.
>
>
>             Sure, but on the other hand I think it's possible to
>         implement and
>             improve modeling of various API calls' effects
>         step-by-step. Let's
>             say, it case of SmartPtrInit checker mentioned before the bare
>             minimum would be modeling of construction (and destruction to
>             avoid leak report from existing related checkers).
>
>                 which is why the few experimental C++ checkers that we
>                 currently have required heavy hacks to become
>         possible. But
>                 for now you'd rather keep an eye on these problems.
>
>
>             Does it mean it's better to wait a bit for Core
>         improvements from
>             main contributors? Does it make sense to make an efforts to
>             implement SmartPtrItnit checker in current state of things?
>
>
>     You should feel free to start working on the checker in an
>     incremental manner (everything is better in an incremental
>     manner!), but in some cases i may prefer improving the checker API
>     or fixing core modeling bugs instead of writing large portions of
>     checker code to work around inconvenient APIs or bugs, and this is
>     something i might not be immediately capable of doing myself (due
>     to being busy with other things), which may potentially delay your
>     progress.
>
>             Thanks in advance and regards,
>             Alexey K
>
>             2018-03-16 21:47 GMT+03:00 Artem Dergachev
>         <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>
>             <mailto:noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>>>:
>
>
>                 Dynamic memory management is pretty much done at this
>         point -
>                 we have very good support for malloc()-like stuff and
>                 reasonably good support for operator new() (support for
>                 operator new[]() is currently missing because it requires
>                 calling an indefinite number of constructors which is
>         not yet
>                 implemented). There is also a known issue with custom
>         operator
>                 new(...) returning null pointers - in this case we
>         should not
>                 be calling the constructor but for now we evaluate the
>                 constructor conservatively.
>
>                 Your real problem will be managing smart pointers
>         themselves
>                 because they are C++ objects that have a myriad of
>         methods,
>                 they can be copied, moved, assigned, move-assigned, they
>                 destroyed, they lifetime-extended, you need to keep all of
>                 this in mind when writing a checker. This is slowly
>         getting
>                 easier because i'm currently working on that, but until
>                 recently it wasn't working correctly in the core, let
>         alone
>                 checkers, which is why the few experimental C++
>         checkers that
>                 we currently have required heavy hacks to become
>         possible. But
>                 for now you'd rather keep an eye on these problems.
>
>                 On 16/03/2018 3:57 AM, Aleksei Sidorin via cfe-dev wrote:
>
>                     Hi Alexey!
>
>                     Could you share your ideas of checker design? It is
>                     possible that problems you met can be solved in
>         different
>                     (maybe even better) way if we know the whole picture.
>                     Regarding your questions, you can find some
>         answers below.
>
>                     1. a) The first way for checker communication is
>         to share
>                     their program state trait. You can see an example
>         of how
>                     state trait is exported in GenericTaintChecker or
>                     MPIChecker. Generally, you just create a named
>                     ProgramStateTrait in the header. You can take a
>         look at
>                     TaintManager.h and MPITypes.h and how they are used.
>                     b) To set a dependency from another checker, you
>         can just
>                     register it while registering your checker. An
>         example can
>                     be found in MallocChecker where register$Checker also
>                     calls registerCStringCheckerBasic to register a
>         checker it
>                     depends on.
>                     As you pointed, inter-checker communication can
>         become a
>                     source of some problems. Most of them are discussed in
>                     this conversation:
>         http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html
>         <http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html>
>                    
>         <http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html
>         <http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Design-idea-separate-modelling-from-checking-td4059122.html>>
>
>                     2. I think there is nothing bad in sharing RegionState
>                     across checkers in the way shown in 1a.
>
>                     3. Artem Dergachev has done some excellent work on
>                     improvement of operator 'new' processing in CSA
>         engine.
>                     Regarding checkers, I can see some on
>         https://clang-analyzer.llvm.org/potential_checkers.html
>         <https://clang-analyzer.llvm.org/potential_checkers.html>
>                    
>         <https://clang-analyzer.llvm.org/potential_checkers.html
>         <https://clang-analyzer.llvm.org/potential_checkers.html>>:
>                     for example, undefbehavior.AutoptrsOwnSameObj. You can
>                     search this list to find more.
>
>
>                     15.03.2018 23:54, Alexey Knyshev via cfe-dev пишет:
>
>                         Hi there!
>
>                         While thinking about how it would be possible to
>                         implement various smart ptr related checkers I
>         tried
>                         to review current state of MallocChecker and
>         came up
>                         with that it would be great to have
>         RegionState info
>                         available to other checkers. Could you please
>         share
>                         your points of view and comments on the following
>                         statements / questions:
>
>                         1. Is there any right way for chaining
>         checkers? How
>                         they are expected to communicate between each
>         other
>                         (excluding generation of nodes /
>         ProgramStates). I've
>                         heard that there are couple of problems caused by
>                         inlining functions, constructors / descructors.
>                         2. What do you think about moving RegionState
>         to the
>                         Core of CSA or providing optional extended info in
>                         MemRegion about the source of such region (new
>                         opearator / array new, malloc, alloca, etc). So it
>                         would be available to all checkers.
>                         3. Is there any roadmap for CSA and especially for
>                         dynamic memory management modeling & related
>         checkers?
>
>                         Regards, Alexey K
>
>                         -- linkedin.com/profile
>         <http://linkedin.com/profile> <http://linkedin.com/profile>
>                        
>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBteiQnWsYm-S9yxT7wQkfWhSw
>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBteiQnWsYm-S9yxT7wQkfWhSw>
>                        
>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBteiQnWsYm-S9yxT7wQkfWhSw
>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBteiQnWsYm-S9yxT7wQkfWhSw>>>
>
>         github.com/alexeyknyshev <http://github.com/alexeyknyshev>
>                         <http://github.com/alexeyknyshev
>         <http://github.com/alexeyknyshev>>
>                         <http://github.com/alexeyknyshev
>         <http://github.com/alexeyknyshev>
>                         <http://github.com/alexeyknyshev
>         <http://github.com/alexeyknyshev>>>
>         bitbucket.org/alexeyknyshev <http://bitbucket.org/alexeyknyshev>
>                         <http://bitbucket.org/alexeyknyshev
>         <http://bitbucket.org/alexeyknyshev>>
>                         <https://bitbucket.org/alexeyknyshev/
>         <https://bitbucket.org/alexeyknyshev/>
>                         <https://bitbucket.org/alexeyknyshev/
>         <https://bitbucket.org/alexeyknyshev/>>>
>
>
>                         _______________________________________________
>                         cfe-dev mailing list
>         cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>         <mailto:cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>>
>         http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>                        
>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>>
>
>
>
>                     --             Best regards,
>                     Aleksei Sidorin,
>                     SRR, Samsung Electronics
>
>
>                     _______________________________________________
>                     cfe-dev mailing list
>         cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>         <mailto:cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>>
>         http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>                    
>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>>
>
>
>
>
>
>             -- linkedin.com/profile <http://linkedin.com/profile>
>            
>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBteiQnWsYm-S9yxT7wQkfWhSw
>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBteiQnWsYm-S9yxT7wQkfWhSw>>
>
>         github.com/alexeyknyshev <http://github.com/alexeyknyshev>
>         <http://github.com/alexeyknyshev
>         <http://github.com/alexeyknyshev>>
>         bitbucket.org/alexeyknyshev
>         <http://bitbucket.org/alexeyknyshev>
>         <https://bitbucket.org/alexeyknyshev/
>         <https://bitbucket.org/alexeyknyshev/>>
>
>
>
>
>         -- 
>         linkedin.com/profile <http://linkedin.com/profile>
>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBteiQnWsYm-S9yxT7wQkfWhSw
>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBteiQnWsYm-S9yxT7wQkfWhSw>>
>
>         github.com/alexeyknyshev <http://github.com/alexeyknyshev>
>         <http://github.com/alexeyknyshev
>         <http://github.com/alexeyknyshev>>
>         bitbucket.org/alexeyknyshev
>         <http://bitbucket.org/alexeyknyshev>
>         <https://bitbucket.org/alexeyknyshev/
>         <https://bitbucket.org/alexeyknyshev/>>
>
>
>
>
>
> -- 
> linkedin.com/profile 
> <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBteiQnWsYm-S9yxT7wQkfWhSw>
>
> github.com/alexeyknyshev <http://github.com/alexeyknyshev>
> bitbucket.org/alexeyknyshev <https://bitbucket.org/alexeyknyshev/>