[cfe-dev] Question about checkRegionChanges in static analyzer

Tue Mar 13 11:36:16 PDT 2018

Weird, I was super sure I answered that back then. Probably misclicked 
something :/ Sorries.

Normally you shouldn't be using explicit_regions at all. They are 
tweaked for the use-case of RetainCountChecker which behaves very 
differently from other checkers, because invariants imposed by Apple 
memory management policy for Foundation and CoreFoundation are about 
maintaining a clear ownership of the object, not simply making sure it's 
released at some point - which is not how most checkers work.

For the purposes of invalidation you should not really care if the 
region was explicitly passed into the function or is implicitly 
retrievable from within the call by exploring the heap pointed to by 
other arguments or globals. In either case the object's state is not 
known anymore.

And the non-explicit regions array seems to work just fine: when the 
object is passed by a const pointer or reference, its contents are not 
invalidated, because, well, you cannot (or rather should not) mutate an 
object through a const pointer.

Now, speaking of 't' and 't.d' - this behavior is also by design 
(documented in docs/analyzer/RegionStore.txt), because it is possible to 
mutate t.s through a non-const pointer to t.d by using completely safe 
pointer arithmetic: (this - offsetof(T, s) + offsetof(T, d)) and such. 
This is definitely a language feature that some projects use (but 
definitely not many). I think you should be able to partially suppress 
it on the checker side, probably by looking for sub-regions of the 
region in the explicit_regions list (and see what happens). We've been 
thinking about investigating possibilities to make this feature less 
aggressive but it requires a very careful investigation because false 
positives are more scary than false negatives, and most of the time 
extra escaping causes more false negatives than false positives.

On 28/02/2018 1:28 PM, likan_999.student at sina.com wrote:
> Thanks Artem for clarification. It is very clear and now I understand the situation. Sorry to bother again, but now I have yet another question.
>
> I am now trying to make the checker work for pointer escape. Since I track the status of objects using MemRegion, so I am using checkRegionChanges. I've observed something weird.
>
> Suppose I have free functions whose definitions are not seen by current TU: void f(const S*); void g(S*);
> 1. Suppose I have
> S s;
> f(&s);
> Then in checkRegionChanges, s's region will appear in explicit_regions but not regions.
> 2. Suppose I have
> S s;
> g(&s);
> Then in checkRegionChanges, t's region will appear in both explicit_regions and regions.
>
> Based on the above observations, if I want to remove s from GDM in case 2 but not 1, then I need to iterate over all entries in GDM and remove the ones that are sub-regions of any one in `region` variable, not `explicit_regions` variable. However, this doesn't work for the following case
> 3. If I have struct T { S s; D d; }; and U has some non-const member void D::h(); for the following call:
> T t;
> t.d.h();
> Then in checkRegionChanges, t.d is in explicit_regions and t is in regions. So t.s will be removed from GDM if I check 'regions' variable, which is incorrect.
>
> My question is, is case 1 a bug or a feature that s appears in explicit_regions? If it is by design, how what is the blessed way to distinguish the above cases? Thanks.
>
> BTW: the code that causes the behavior is
> https://clang.llvm.org/doxygen/RegionStore_8cpp_source.html#l01256
>       if (const MemRegion *R = V.getAsRegion()) {
>         if (TopLevelRegions)
>           TopLevelRegions->push_back(R);
>         W.AddToWorkList(R);
>         continue;
>       }
> All regions are added to TopLevelRegions, without checking RegionAndSymbolInvalidationTraits::TK_PreserveContents.
>
> ----- Original Message -----
> From: Artem Dergachev <noqnoqneo at gmail.com>
> To: likan_999.student at sina.com, Aleksei Sidorin <a.sidorin at samsung.com>, cfe-dev <cfe-dev at lists.llvm.org>
> Subject: Re: [cfe-dev] Static analyzer question on how to tell if a Decl is a particular c++ function, or a particular class
> Date: 2018-02-23 09:20
>
>
> You're essentially in an area that's largely under construction at the
> moment, and i'm likely to address your issue in like nearest future, you
> can probably follow my progress on phabricator or in cfe-commits, or i
> can try to ping you when this case is supported.
> Being source-based, the analyzer needs to model every possible language
> construct explicitly, and in C++ it means *a ton of stuff*. We could
> analyze, like, llvm IR instead, and it'd be much easier to implement
> correctly for us, but it'd be much harder to produce user-friendly
> warnings, because the user does think in terms of source code. And it'd
> be harder to make checkers for the same reason. Which is why "symbolic
> execution" we're doing here is to compilation like algebra to
> arithmetic: we need to do the same thing - support all language
> constructs - but with a lot of "concrete" values replaced with algebraic
> unknown values. Luckily, it is also easier because we don't have to
> support everything perfectly: we can always say that we don't know
> something, behave "conservatively", and the analysis would be slightly
> less precise but this is fine if there's not too much of these.
> Modeling temporaries is one of such cases - this is one of the difficult
> parts of the language that we never did properly until now. One does not
> simply construct into a temporary region because in this case we're not
> even sure that it is indeed a temporary region in this part of the
> analyzer, it cannot be understood by looking at the construct-expression
> in the AST, which is something i'm working around now by introducing
> "construction contexts". And even if we were sure it's a temporary, we
> need to not only construct the object, but also know how to properly
> extend its lifetime and destroy it when lifetime ends, otherwise we'd
> encounter massive false positives (because it'd not be "we don't know
> something", it'd be "we know something but it's wrong"), which should
> also be part of the "context" in which construction takes place. So for
> now this part simply behaves conservatively.
> Your particular case will be relatively tricky when f() is inlined
> during analysis. In this case the temporary will be constructed within
> the function, but in order to properly work with this temporary we'd
> need what the caller function is going to do with it - at least
> according to the AST that we currently have this is what we have to do.
> More info in
> http://lists.llvm.org/pipermail/cfe-dev/2018-February/056898.html - "==
> Return by value ==" part. CodeGen does it differently, treating a lot of
> AST rvalues as lvalues under the hood, but i don't think we can afford
> breaking this invariant for now, because, well, we're more complicated
> and fragile.
> On 22/02/2018 4:46 PM, likan_999.student at sina.com wrote:
>> Thanks Artem, Aleksei for those precious information.
>>
>> Yes, I observed exactly the same thing about MaterializeTemporaryExpr
>> and I actually spent quite sometime trace into the clang's codebase
>> and reached to the exactly same place in the link:
>> createTemporaryRegionIfNeeded. My workaround was inspired by one of
>> the example checkers, that uses a checkPostStmt on
>> MaterializeTemporaryExpr. Thanks for pointing out this.
>>
>> Aleksei, thanks for pointing out the getQualifiedNameAsString, I
>> didn't know I could use that  . I personally prefers it because
>> getParent()->getName() doesn't contain the namespace, but only the
>> class name. So to make it safe, I need to go all the way up to
>> traverse all parent decls, and getQualifiedNameAsString already does
>> the thing for me.
>>
>> Thanks for taking your time to answer my question. I appreciate it.
>>
>> Now I observed some other weird thing, and I wonder if anyone has seen
>> this before and knows any solution.
>>
>> As a reminder, the checker I am writing is for a particular class T in
>> my company's code base and the checker is to ensure T.ok() == true
>> before calling T.value().
>>
>> The above is actually a over-simplification to my task. The class T
>> actually has a member, Let's say s_ (in type S) to keep track of the
>> status, so T.ok() actually returns s_.ok(). And type S is used
>> directly by a few other classes and macros which are used to creates
>> T. So to make the check fully working, I also need to model the
>> behaviors of S and a few other classes.
>>
>> Fortunately, T are template class and therefore all method
>> implementations are visible to current TU. I was thinking, it is a
>> whole lot easier if I can just model the behavior of S, and rely on
>> inline evaluation in the static analyzer. In most cases, it works,
>> except I found the following corner case (suppose T has a conversion
>> constructor from X: T(X&&) and default move constructor T(T&&)):
>>
>> X f();
>> T t = f();
>>
>> This is essentially constructs a temporary T t2{f()} and then doing a
>> move T t{std::move(t2)}. It looks fine if everything is inlined,
>> however, the inline doesn't work because t2 is a temporary. I dived
>> into the code base and what happened is
>>
>> 1. At the end of ExprEngine::getRegionForConstructedObject
>> <https://clang.llvm.org/doxygen/ExprEngineCXX_8cpp_source.html#%2100183>,
>> it is marked as improperly modeled target region:
>> CallOpts.IsCtorOrDtorWithImproperlyModeledTargetRegion
>> <https://clang.llvm.org/doxygen/structclang_1_1ento_1_1ExprEngine_1_1EvalCallOptions.html#a339aa5a5b9aecfa8a844647b8d211eb0>=
>> true;
>> 2. In ExprEngine::mayInlineCallKind
>> <https://clang.llvm.org/doxygen/ExprEngineCallAndReturn_8cpp_source.html#%2100677>,
>> it returns false if the flag is set:
>> // If we did not find the correct this-region, it would be pointless
>> // to inline the constructor. Instead we will simply invalidate // the
>> fake temporary target. if (CallOpts
>> <https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm/tools/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp?l=618&ct=xref_jump_to_def&gsn=CallOpts&rcl=186553626>.IsCtorOrDtorWithImproperlyModeledTargetRegion
>> <https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm/tools/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h?l=62&ct=xref_jump_to_def&gsn=IsCtorOrDtorWithImproperlyModeledTargetRegion&rcl=186553626>)
>> return CIP_DisallowedOnce
>> <https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm/tools/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h?l=623&ct=xref_jump_to_def&gsn=CIP_DisallowedOnce&rcl=186553626>;
>>
>> I am not an expert in C++ compiler. I just wonder why it can't
>> allocates a temporary region for temporary variables? Having this
>> special case means I need to model all T's constructors, which is
>> really annoying. Does anyone knows some workarounds? Thanks.
>>
>>
>> ----- Original Message -----
>> From: Artem Dergachev <noqnoqneo at gmail.com>
>> To: Aleksei Sidorin <a.sidorin at samsung.com>, Li Kan
>> <likan_999.student at sina.com>, cfe-dev at lists.llvm.org
>> Subject: Re: [cfe-dev] Static analyzer question on how to tell if a
>> Decl is a particular c++ function, or a particular class
>> Date: 2018-02-16 02:26
>>
>>
>> Yeah, the traditional way to see if this is a certain method of a
>> certain class is to compare the name of the method
>> (CXXMethodDecl->getName()) and the class
>> (CXXMethodDecl->getParent()->getName()), as strings, to the desired
>> strings. getQualifiedNameAsString() works fine, but it's considered slow
>> for whatever reason (i've never observed it personally). ASTMatchers
>> would also work, even if a bit of an overkill. Don't forget to check the
>> method's argument number and types as well. We have an effort to make
>> this whole common idiom more simple via CallDescription, but it doesn't
>> work for C++ methods yet.
>> With C++, you'll need to be aware of the bug we're currently having with
>> object lifetime extension. If you're identifying the object by its
>> memory region (which is correct), you'd occasionally notice that the
>> region for a C++ temporary object may change along the analysis even if
>> no copy-construction occurs. It has been making our new C++ checkers
>> (most noticeably, the iterator checker) more difficult to write than
>> they needed to be. More details in
>> http://lists.llvm.org/pipermail/cfe-dev/2018-February/056898.html - i'm
>> hoping to do something about it soon.
>> On 15/02/2018 1:35 AM, Aleksei Sidorin via cfe-dev wrote:
>>> Hello Li,
>>>
>>> What you're doing is mostly fine. I can only add some potentially
>>> useful info inline.
>>>
>>> 14.02.2018 23:04, Li Kan via cfe-dev пишет:
>>>> Hi folks,
>>>>
>>>> If this is not the correct mailing list for this question, please let
>>>> me know.
>>> You're in the right place. Welcome :)
>>>
>>>> I am trying to write a static analyzer for code base of my current
>>>> job. The particular static analyzer I am writing involves a
>>>> particular class of my company's code base. Let's say it is T. I want
>>>> to write a checker to ensure T.ok() is called before T.value() is
>> called.
>>>> Static analyzer is perfect for this type of check as it requires path
>>>> sensitive checks. When I trying to write the checker, I basically
>>>> checks for pre-call and post-call. I want to tell if a CallEvent is
>>>> T.ok() and T.value(). Currently what I am doing is:
>>>> 1. From CallEvent, I cast to CXXMemberCall, and getOriginExpr(), then
>>>> call getMethodDecl().
>>> You can also try to do the following chain:
>>> dyn_cast_or_null<CXXMethodDecl>(CallEvent.getDecl()) to obtain
>>> CXXMethodDecl.
>>>
>>>> 2. From CXXMethodDecl, I first call getThisType(), then call
>>>> getAsCXXRecordDecl(), then getQualifiedNameAsString(), and compare
>>>> with qualified name of T, to make sure it is member of T.
>>> To obtain parent class declaration from CXXMethodDecl, you can use
>>> getParent() method. So, the pseudocode will look like
>>> "MethodDecl->getParent()->getQualifiedNameAsString()".
>>>
>>>> 3. From CXXMethodDecl, I call getNameAsString() to get the method
>>>> name, and compare them with "ok" and "value".
>>> This looks OK. But it can be useful to know that
>>> getQualifiedNameAsString() for CXXMethodDecl will also include parent
>>> name. So, you can just check if MethodDecl->getQualifiedNameAsString()
>>> is equal to "T::value" or "T::ok".
>>>
>>>> It works, but it looks complicated, and involves string comparison,
>>>> which I assume is slow. Is there a easier way, blessed by clang
>>>> static analyzer official teams, that tell if a MethodDecl, or
>>>> CXXRecordDecl, is the function or class I am interested in?
>>>>
>>>> The ideal way is, there is one-time call to lookup the MethodDecl for
>>>> T::ok() and T::value(), and CXXRecordDecl for T. Then ever since
>>>> then, I just need to compare the pointers.
>>> This is possible. You can just launch a simple matcher against AST:
>>> cxxMethodDecl(hasName("::T::value"))
>>> and get CXXMethodDecl you need and compare with it later.
>>>
>>>> Another way is, the first time I found it is T, T::ok() or
>>>> T::value(), I save the pointer of CXXRecordDecl or MethodDecl, and
>>>> use the pointers later. Is this the reliable (assume I use the
>>>> pointer canonical decl will not change) and blessed way to do it? If
>>>> it is, is my steps above the correct and simplest way to determine it
>>>> is T::ok or T::value? Is there some better and more reliable way?
>>> I think it is reliable enough: pointers to canonical declarations
>>> don't change in AST.
>>>> Thanks.
>>>>
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>> --
>>> Best regards,
>>> Aleksei Sidorin,
>>> SRR, Samsung Electronics
>>>
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev