[cfe-dev] [analyzer] MallocChecker & checker dependencies

Alexey Knyshev via cfe-dev cfe-dev at lists.llvm.org
Wed Mar 28 03:57:17 PDT 2018


Hi all,
Any ideas on it?

Regards, Alexey K

2018-03-26 12:01 GMT+03:00 Alexey Knyshev <alexey.knyshev at gmail.com>:

> Hi all,
>
> During last few days I was working on std::shared_ptr constructor modeling
> and trying to understand how path sensitive analysis works. After some time
> I faced unexpected modelling of Call arguments SVals. Here is the portion
> of code that caused analyzer behavior that I don't understand:
>
> auto del = std::default_delete<S>();
> return std::shared_ptr<S>((S*)malloc(sizeof(S)), del);
>
> The case I observed: first argument SVal is modeled properly and I can get
> RegionState info from program state (provided by MallocChecker). The second
> one (del symbol) is unknown (DelSVal.dump() shows me that). On the other
> hand, the following code doesn't cause such problem and prints
> '&code{free}' as expected:
>
> const auto &del = free;
> return std::shared_ptr<S>((S*)malloc(sizeof(S)), del);
>
> So, the question is: what can cause such imprecise modeling of SVal
> (Deleter - second constructor arg)?
>
> P.S. I've changed shared_ptr constructor declaration in
> 'test/Analysis/Inputs/system-header-simulator-cxx-std-suppression.h':
>
> shared_ptr(_Tp* __p, void(*deleter)(void*))
>
> to
>
> template<class Deleter>
> shared_ptr(_Tp* __p, Deleter d)
>
> and I don't understand why it was declared with function pointer argument
> instead template. But looks like it isn't the source of my problem at
> first sight.
>
> Regards, Alexey K
>
> 2018-03-20 0:37 GMT+03:00 Artem Dergachev <noqnoqneo at gmail.com>:
>
>>
>>
>> On 17/03/2018 7:25 AM, Alexey Knyshev wrote:
>>
>>>
>>>     This should already be supported by MallocChecker - i.e. it warns
>>>     when you allocate memory with, say, new() and release it with
>>>     free() or delete[]().
>>>
>>>     If methods of a smart pointer are "inlined" during analysis, these
>>>     bugs will already be caught automatically.
>>>
>>>
>>> I took a look at your recent work & status report posted there in
>>> mailing lists. And I have a question about how it would work in particular
>>> cases, e.g.:
>>> 1. Returning smart_ptr by value from function that has it's
>>> implementation in analyzed translation unit but it isn't called from any
>>> other function in the same TU.
>>>
>>> std::shared_ptr<S> createInstance() {
>>>    return std::shared_ptr<S>(new S, free);
>>> }
>>>
>>> If I understand correctly, compiler is free to apply copy-elison
>>> optimization on shared_ptr which is returned by value. So destructor call
>>> won't be modeled properly because it's not inlined during analysis. And bug
>>> won't be caught.
>>>
>>
>> Indeed, the tempting - but risky - thing to do here is to declare
>> construction of "std::shared_ptr<S>(new S, free);" to be a bug on its own.
>> It is risky because it leaves us with a checker that finds non-bugs; there
>> is a chance that the destructor will never be called while the pointer is
>> still being owned by the shared_ptr. Such code would be ridiculous but at
>> the same time we wouldn't like to drive away users who had to write this to
>> work around an equally ridiculous proprietary API that they have no way of
>> changing.
>>
>> In this particular case i'd have tried to take the risk, keeping in mind
>> that i might end up rewriting the checker back to the conservative behavior
>> of only warning at the destructor.
>>
>> Note that copy elision is irrelevant, because in any case we won't free()
>> the pointer at the return site, because reference count will never reach
>> zero within the function. That's the whole point of the shared pointer.
>>
>> 2. As you mentioned before, report would be fired only if constructor and
>>> destructor have been inlined (which is not guaranteed to be done). As the
>>> result such check won't work in general case.
>>>
>>
>> Yeah, that's pretty much the primary motivation for making an
>> API-specific checker. It's not super strong in this case because this would
>> likely give a very slight increase over the existing coverage.
>>
>> In this case, however, the much more important motivation is to avoid
>> false positives that occur when we don't know the original reference count
>> of a shared pointer that was constructed before analysis has started - and
>> we start "assuming..." that any destructor of a shared pointer could
>> release memory. This is currently planned to be suppressed in a fairly
>> gross manner (https://reviews.llvm.org/D44281). For this reason we
>> believe that having explicit modeling for various smart pointers is a must.
>>
>> Actually my point is about modeling smart pointers behavior specifically
>>> and don't inline any calls to constructor / destructor & methods those
>>> modeled by checker. And the question immediately arises: is there way to
>>> control inline policy?
>>>
>>
>> It is indeed possible to make the checker "take over" function call
>> modeling by subscribing on the `eval::Call` callback. In this case both
>> inlining and conservative evaluation by the core are avoided, and the
>> checker who evaluates the call is responsible for modeling *all* effects of
>> the call (because two checkers cannot evalCall the same call, even though
>> they can do the usual checkPreCall/checkPostCall thingy as much as they
>> want).
>>
>> There is also the BodyFarm mechanism (which allows you to mock simplified
>> ASTs for standard functions) but you DO NOT WANT to use it for C++ because
>> it'd take you forever to construct the correct C++ AST with templates
>> manually.
>>
>> And sorry for any possible misunderstanding from my side as I'm not quite
>>> familiar with CSA internals.
>>>
>>> Regards, Alexey K
>>>
>>> 2018-03-17 5:21 GMT+03:00 Artem Dergachev <noqnoqneo at gmail.com <mailto:
>>> noqnoqneo at gmail.com>>:
>>>
>>>
>>>
>>>     On 16/03/2018 12:30 PM, Alexey Knyshev wrote:
>>>
>>>         Sorry, accidentally removed part of message before sending.
>>>
>>>         Aleksei,
>>>
>>>             Could you share your ideas of checker design? It is
>>>         possible that
>>>             problems you met can be solved in different (maybe even
>>>         better)
>>>             way if we know the whole picture.
>>>
>>>
>>>         I'm currently have no clear idea how to implement it in the
>>>         "right manner". But first of all I would aim to track the
>>>         origin (malloc, new, new [], etc) of SVal and check if Deleter
>>>         is the expected corresponding way to deallocate such SVal.
>>>
>>>
>>>     This should already be supported by MallocChecker - i.e. it warns
>>>     when you allocate memory with, say, new() and release it with
>>>     free() or delete[]().
>>>
>>>     If methods of a smart pointer are "inlined" during analysis, these
>>>     bugs will already be caught automatically.
>>>
>>>
>>>         Regards, Alexey K
>>>
>>>         2018-03-16 22:22 GMT+03:00 Alexey Knyshev
>>>         <alexey.knyshev at gmail.com <mailto:alexey.knyshev at gmail.com>
>>>         <mailto:alexey.knyshev at gmail.com
>>>
>>>         <mailto:alexey.knyshev at gmail.com>>>:
>>>
>>>             Hello guys,
>>>
>>>             And thanks for your attention
>>>
>>>             Aleksei,
>>>
>>>                 Could you share your ideas of checker design? It is
>>>         possible
>>>                 that problems you met can be solved in different
>>>         (maybe even
>>>                 better) way if we know the whole picture.
>>>
>>>
>>>             As I said, I'm interested in improving current state of
>>>         dynamic
>>>             memory modeling. Especially stuff related to smart
>>>         pointers (e.g.
>>>             shared, unique) which also mentioned in list of potential
>>>         checkers
>>>             as smartptr.SmartPtrInit
>>>             <https://clang-analyzer.llvm.org/potential_checkers.html
>>>         <https://clang-analyzer.llvm.org/potential_checkers.html>>*
>>>
>>>             *
>>>
>>>                 **You can see an example of how state trait is exported
>>> in
>>>                 GenericTaintChecker or MPIChecker. Generally, you just
>>>         create
>>>                 a named ProgramStateTrait in the header.
>>>
>>>
>>>             Briefly looked at MPITypes.h. Does it mean we should move
>>>             RegionState to separate file and register it via Traits in
>>> the
>>>             same manner to make it avaliable from other checkers (not
>>>         other TU
>>>             as mentioned in ento::mpi::RequestMap)?
>>>
>>>
>>>     GenericTaintChecker is made in a fairly intrusive manner by
>>>     putting stuff into the program state class directly, which isn't
>>>     very sane. I'd definitely prefer something similar to dynamic type
>>>     propagation (see DynamicTypeMap.cpp, DynamicTypeMap.h). You can
>>>     use the usual REGISTER_MAP_WITH_PROGRAMSTATE (etc.) macros in the
>>>     .cpp file as long as you provide accessor methods declared in the
>>>     .h file that other checkers could include.
>>>
>>>
>>>             Artem,
>>>
>>>                 you need to keep all of this in mind when writing a
>>>         checker.
>>>
>>>
>>>             Sure, but on the other hand I think it's possible to
>>>         implement and
>>>             improve modeling of various API calls' effects
>>>         step-by-step. Let's
>>>             say, it case of SmartPtrInit checker mentioned before the
>>> bare
>>>             minimum would be modeling of construction (and destruction to
>>>             avoid leak report from existing related checkers).
>>>
>>>                 which is why the few experimental C++ checkers that we
>>>                 currently have required heavy hacks to become
>>>         possible. But
>>>                 for now you'd rather keep an eye on these problems.
>>>
>>>
>>>             Does it mean it's better to wait a bit for Core
>>>         improvements from
>>>             main contributors? Does it make sense to make an efforts to
>>>             implement SmartPtrItnit checker in current state of things?
>>>
>>>
>>>     You should feel free to start working on the checker in an
>>>     incremental manner (everything is better in an incremental
>>>     manner!), but in some cases i may prefer improving the checker API
>>>     or fixing core modeling bugs instead of writing large portions of
>>>     checker code to work around inconvenient APIs or bugs, and this is
>>>     something i might not be immediately capable of doing myself (due
>>>     to being busy with other things), which may potentially delay your
>>>     progress.
>>>
>>>             Thanks in advance and regards,
>>>             Alexey K
>>>
>>>             2018-03-16 21:47 GMT+03:00 Artem Dergachev
>>>         <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>
>>>             <mailto:noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>>>:
>>>
>>>
>>>
>>>                 Dynamic memory management is pretty much done at this
>>>         point -
>>>                 we have very good support for malloc()-like stuff and
>>>                 reasonably good support for operator new() (support for
>>>                 operator new[]() is currently missing because it requires
>>>                 calling an indefinite number of constructors which is
>>>         not yet
>>>                 implemented). There is also a known issue with custom
>>>         operator
>>>                 new(...) returning null pointers - in this case we
>>>         should not
>>>                 be calling the constructor but for now we evaluate the
>>>                 constructor conservatively.
>>>
>>>                 Your real problem will be managing smart pointers
>>>         themselves
>>>                 because they are C++ objects that have a myriad of
>>>         methods,
>>>                 they can be copied, moved, assigned, move-assigned, they
>>>                 destroyed, they lifetime-extended, you need to keep all
>>> of
>>>                 this in mind when writing a checker. This is slowly
>>>         getting
>>>                 easier because i'm currently working on that, but until
>>>                 recently it wasn't working correctly in the core, let
>>>         alone
>>>                 checkers, which is why the few experimental C++
>>>         checkers that
>>>                 we currently have required heavy hacks to become
>>>         possible. But
>>>                 for now you'd rather keep an eye on these problems.
>>>
>>>                 On 16/03/2018 3:57 AM, Aleksei Sidorin via cfe-dev wrote:
>>>
>>>                     Hi Alexey!
>>>
>>>                     Could you share your ideas of checker design? It is
>>>                     possible that problems you met can be solved in
>>>         different
>>>                     (maybe even better) way if we know the whole picture.
>>>                     Regarding your questions, you can find some
>>>         answers below.
>>>
>>>                     1. a) The first way for checker communication is
>>>         to share
>>>                     their program state trait. You can see an example
>>>         of how
>>>                     state trait is exported in GenericTaintChecker or
>>>                     MPIChecker. Generally, you just create a named
>>>                     ProgramStateTrait in the header. You can take a
>>>         look at
>>>                     TaintManager.h and MPITypes.h and how they are used.
>>>                     b) To set a dependency from another checker, you
>>>         can just
>>>                     register it while registering your checker. An
>>>         example can
>>>                     be found in MallocChecker where register$Checker also
>>>                     calls registerCStringCheckerBasic to register a
>>>         checker it
>>>                     depends on.
>>>                     As you pointed, inter-checker communication can
>>>         become a
>>>                     source of some problems. Most of them are discussed
>>> in
>>>                     this conversation:
>>>         http://clang-developers.42468.n3.nabble.com/analyzer-RFC-Des
>>> ign-idea-separate-modelling-from-checking-td4059122.html
>>>         <http://clang-developers.42468.n3.nabble.com/analyzer-RFC-De
>>> sign-idea-separate-modelling-from-checking-td4059122.html>
>>>
>>>         <http://clang-developers.42468.n3.nabble.com/analyzer-RFC-De
>>> sign-idea-separate-modelling-from-checking-td4059122.html
>>>         <http://clang-developers.42468.n3.nabble.com/analyzer-RFC-De
>>> sign-idea-separate-modelling-from-checking-td4059122.html>>
>>>
>>>                     2. I think there is nothing bad in sharing
>>> RegionState
>>>                     across checkers in the way shown in 1a.
>>>
>>>                     3. Artem Dergachev has done some excellent work on
>>>                     improvement of operator 'new' processing in CSA
>>>         engine.
>>>                     Regarding checkers, I can see some on
>>>         https://clang-analyzer.llvm.org/potential_checkers.html
>>>         <https://clang-analyzer.llvm.org/potential_checkers.html>
>>>
>>>         <https://clang-analyzer.llvm.org/potential_checkers.html
>>>         <https://clang-analyzer.llvm.org/potential_checkers.html>>:
>>>                     for example, undefbehavior.AutoptrsOwnSameObj. You
>>> can
>>>                     search this list to find more.
>>>
>>>
>>>                     15.03.2018 23:54, Alexey Knyshev via cfe-dev пишет:
>>>
>>>                         Hi there!
>>>
>>>                         While thinking about how it would be possible to
>>>                         implement various smart ptr related checkers I
>>>         tried
>>>                         to review current state of MallocChecker and
>>>         came up
>>>                         with that it would be great to have
>>>         RegionState info
>>>                         available to other checkers. Could you please
>>>         share
>>>                         your points of view and comments on the following
>>>                         statements / questions:
>>>
>>>                         1. Is there any right way for chaining
>>>         checkers? How
>>>                         they are expected to communicate between each
>>>         other
>>>                         (excluding generation of nodes /
>>>         ProgramStates). I've
>>>                         heard that there are couple of problems caused by
>>>                         inlining functions, constructors / descructors.
>>>                         2. What do you think about moving RegionState
>>>         to the
>>>                         Core of CSA or providing optional extended info
>>> in
>>>                         MemRegion about the source of such region (new
>>>                         opearator / array new, malloc, alloca, etc). So
>>> it
>>>                         would be available to all checkers.
>>>                         3. Is there any roadmap for CSA and especially
>>> for
>>>                         dynamic memory management modeling & related
>>>         checkers?
>>>
>>>                         Regards, Alexey K
>>>
>>>                         -- linkedin.com/profile
>>>         <http://linkedin.com/profile> <http://linkedin.com/profile>
>>>
>>>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBtei
>>> QnWsYm-S9yxT7wQkfWhSw
>>>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBtei
>>> QnWsYm-S9yxT7wQkfWhSw>
>>>
>>>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBtei
>>> QnWsYm-S9yxT7wQkfWhSw
>>>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBtei
>>> QnWsYm-S9yxT7wQkfWhSw>>>
>>>
>>>         github.com/alexeyknyshev <http://github.com/alexeyknyshev>
>>>                         <http://github.com/alexeyknyshev
>>>         <http://github.com/alexeyknyshev>>
>>>                         <http://github.com/alexeyknyshev
>>>         <http://github.com/alexeyknyshev>
>>>                         <http://github.com/alexeyknyshev
>>>         <http://github.com/alexeyknyshev>>>
>>>         bitbucket.org/alexeyknyshev <http://bitbucket.org/alexeyknyshev>
>>>                         <http://bitbucket.org/alexeyknyshev
>>>         <http://bitbucket.org/alexeyknyshev>>
>>>                         <https://bitbucket.org/alexeyknyshev/
>>>         <https://bitbucket.org/alexeyknyshev/>
>>>                         <https://bitbucket.org/alexeyknyshev/
>>>         <https://bitbucket.org/alexeyknyshev/>>>
>>>
>>>
>>>                         _______________________________________________
>>>                         cfe-dev mailing list
>>>         cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>         <mailto:cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>>
>>>         http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>>>
>>>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>>
>>>
>>>
>>>
>>>                     --             Best regards,
>>>                     Aleksei Sidorin,
>>>                     SRR, Samsung Electronics
>>>
>>>
>>>                     _______________________________________________
>>>                     cfe-dev mailing list
>>>         cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>         <mailto:cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>>
>>>         http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>>>
>>>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>>
>>>
>>>
>>>
>>>
>>>
>>>             -- linkedin.com/profile <http://linkedin.com/profile>
>>>
>>>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBtei
>>> QnWsYm-S9yxT7wQkfWhSw
>>>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBtei
>>> QnWsYm-S9yxT7wQkfWhSw>>
>>>
>>>         github.com/alexeyknyshev <http://github.com/alexeyknyshev>
>>>         <http://github.com/alexeyknyshev
>>>         <http://github.com/alexeyknyshev>>
>>>         bitbucket.org/alexeyknyshev
>>>         <http://bitbucket.org/alexeyknyshev>
>>>         <https://bitbucket.org/alexeyknyshev/
>>>         <https://bitbucket.org/alexeyknyshev/>>
>>>
>>>
>>>
>>>
>>>         --         linkedin.com/profile <http://linkedin.com/profile>
>>>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBtei
>>> QnWsYm-S9yxT7wQkfWhSw
>>>         <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBtei
>>> QnWsYm-S9yxT7wQkfWhSw>>
>>>
>>>         github.com/alexeyknyshev <http://github.com/alexeyknyshev>
>>>         <http://github.com/alexeyknyshev
>>>         <http://github.com/alexeyknyshev>>
>>>         bitbucket.org/alexeyknyshev
>>>         <http://bitbucket.org/alexeyknyshev>
>>>         <https://bitbucket.org/alexeyknyshev/
>>>         <https://bitbucket.org/alexeyknyshev/>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> linkedin.com/profile <https://www.linkedin.com/prof
>>> ile/view?id=AAMAABn6oKQBDhBteiQnWsYm-S9yxT7wQkfWhSw>
>>>
>>> github.com/alexeyknyshev <http://github.com/alexeyknyshev>
>>> bitbucket.org/alexeyknyshev <https://bitbucket.org/alexeyknyshev/>
>>>
>>
>>
>
>
> --
> linkedin.com/profile
> <https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBteiQnWsYm-S9yxT7wQkfWhSw>
>
> github.com/alexeyknyshev
> bitbucket.org/alexeyknyshev
>



-- 
linkedin.com/profile
<https://www.linkedin.com/profile/view?id=AAMAABn6oKQBDhBteiQnWsYm-S9yxT7wQkfWhSw>

github.com/alexeyknyshev
bitbucket.org/alexeyknyshev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180328/c0343064/attachment.html>


More information about the cfe-dev mailing list