[cfe-dev] [StaticAnalyzer] Questions about SymbolDerived class and checkDeadSymbol callback

Wed May 31 10:49:18 PDT 2017

Thanks for your detailed explanation. I tried the solution (1) and it works
in my scenario.

On Wed, May 31, 2017 at 5:51 AM, Artem Dergachev <noqnoqneo at gmail.com>
wrote:

> Before anything - my blind guess is that you've found another instance of
> the "zombie symbols" bug, could you try if any of the following solutions
> helps:
>
> (1) could you see if changing "SymReaper.isDead(Sym)" to
> "!SymReaper.isLive(Sym)" helps?
> (2) or, instead of (1), could you implement checkLiveSymbols() callback in
> your checker and blindly say SymReaper.maybeDead(Sym) on all symbols your
> checker tracks?
>
> Long story short, there's currently a counter-intuitive behavior in
> SymbolReaper, which we discovered recently but didn't come up with a good
> fix, due to which dead symbols are a different notion from non-live
> symbols. Generally, dead symbols are symbols that were explicitly released
> during dead symbol collection by one of the program state's entities
> (Environment, Store, Constraints, or one of the checkers through
> checkLiveSymbols() callback) - it's a plain finite set of symbol inside the
> SymbolReaper object, while non-live symbols are symbols that could not be
> proven live anymore by deep-scanning the current program state, regardless
> of whether someone has explicitly released them (and the set of such
> non-live symbols might essentially be infinite; however isLive() is a
> relatively fast operation since scan results are already pre-computed).
>
> When you obtained derived symbols for ha[0] and ha[1], only your checker
> knows that these symbols have been constructed and are being tracked.
> Technically, these symbols were constructed by StoreManager, but he doesn't
> record them in the store, because he doesn't need to do that to provide
> them again. Similarly to SymbolRegionValue, SymbolDerived would be produced
> automatically as long as the default-bound conjured symbol is there, so the
> Store doesn't need to explicitly remember that SymbolDerived is "bound" to
> the array elements. So your checker is the only entity in the program state
> that is aware of these symbols. And therefore there's nobody who could add
> these symbols to the dead set for you to query. So you can either add them
> to the dead set by trick (2) (if they're proven live they won't be added to
> the dead set) or by trick (1) don't rely on the dead set.
>
> SimpleStreamChecker is known to be broken in a similar manner. We noticed
> this before when we figured out that he doesn't find problems of form
>
>   FILE *fp = fopen("foo.txt", "r");
>   fp = 0;
>   return;
>
> where the tracked value is overwritten. The reason is similar: once the
> value is overwritten in the Store, it's no longer visible to any entitiy
> but the checker, and there's nobody to add it to the dead set.
>
> Most of the checkers are unaffected though, because symbols they're
> interested in are also tracked in other parts of the program state. For
> instance, pointers returned by malloc() are known to have a known
> region-extent and default-bound to UndefinedVal, so they're tracked by
> store and constraint manager. Considering that we don't have an awesome
> short-term solution, it seems that we have to advice checker developers to
> take steps (1) or (2) as a workaround.
>
> Additional info:
> - http://lists.llvm.org/pipermail/cfe-dev/2016-March/047922.html
> - https://reviews.llvm.org/D18860
>
> Another affected user:
> - https://reviews.llvm.org/D29419
>
>
>
> 31/05/2017 4:07 AM, Haowei Wu via cfe-dev wrote:
>
>> Hi
>>
>> I am developing a resource leak checker that is very similar to
>> SimpleStreamChecker and I noticed an issue when tracking symbols that is
>> SymbolDerived type.
>>
>> The source code of my checker can be downloaded at this link:
>> https://gist.github.com/zeroomega/2fcabf5979a0e42db1ccb8f30472d071
>>
>> An example of resource leaks that my checker is trying to find is like
>> this:
>>
>> typedef uint32_t handle_t;
>> typedef uint32_t status_t;
>>
>> status_t request(
>>   uint32_t flags,
>>   handle_t* out0,
>>   handle_t* out1);
>>
>> status_t release(handle_t handle);
>>
>> void checkLeak1(int tag) {
>>   handle_t sa, sb;
>>   request(0, &sa, &sb);
>>   if (tag > 0) {
>>     return;
>>   }
>>   release(sa);
>>   release(sb);
>> }
>>
>>
>> Function call "request(0, &sa, &sb)" will allocate 2 resource descriptors
>> (handles) to "sa" and "sb". The resource will be leaked if someone forgot
>> to call "release(sa)" on a resource descriptor (handle).
>>
>> In my checker, I override "checkPostCall" callback to map the SymbolRef
>> of arguments "sa" and "sb" in any "request(0, &sa, &sb)" call to
>> "Allocated" state. And I use "checkPreCall" callback to map the SymbolRef
>> of argument "sa" in any "release(sa)" call to "Released" state. If a
>> tracked SymbolRef is dead when using "SymbolReaper.isDead()" API in
>> "checkDeadSymbol" callback and its state is "Allocated", this SymbolRef
>> will be considered as leaked and will be reported.
>>
>> In the example code "checkLeak1" , there is an early return "if (tag > 0)
>> { return;}" that will not release any handles, which will cause leaks. My
>> checker can successfully detect this leak. However, when I change the leak
>> example to this:
>>
>> void checkLeak2(int tag) {
>>   handle_t ha[2];
>>   request(0, ha, ha+1);
>>   if (tag > 0) {
>>     return;
>>   }
>>   release(ha[0]);
>>   release(ha[1]);
>> }
>>
>>
>> The checker will not report any leaks at all. Further debugging shows
>> that the static analyzer allocates a SymbolConjured symbol for array "ha"
>> and SymbolDerived symbols for item "ha[0]' and "ha[1]". These two
>> SymbolDerived symbols has parent symbol points to SymbolConjured symbol of
>> "ha". In the path of "if (tag > 0) { return;}", the SymbolDerived symbols
>> of "ha[0]" and "ha[1]" are not considered dead when using
>> "SymbolReaper.isDead()". So no leak is reported in this path. These two
>> SymbolDerived symbols are considered dead by "SymbolReaper.isDead()" at the
>> end of path "release(ha[0]);release(ha[1]);".
>>
>>
>> Another two examples can show this behavior more clearly:
>>
>> void checkLeak3(int tag) {
>>   handle_t ha[2];
>>   request(0, ha, ha+1);
>>   return;
>> }
>>
>> The checker will not report any leak on the code of "checkLeak3" example
>> because the SymbolDerived symbols of "ha[0]" and "ha[1]" are not considered
>> dead when "checkDeadSymbol" callback is invoked at "return;" But if I add
>> two fake assignments like this:
>>
>> void checkLeak4(int tag) {
>>   handle_t ha[2];
>>   request(0, ha, ha+1);
>>   ha[0] = *ha;
>>   ha[1] = *(ha+1);
>>   return;
>> }
>>
>> The checker will report leaks on "ha[0]" and "ha[1]" because the
>> SymbolDerived symbols of "ha[0]" and "ha[1]" are considered as dead this
>> time.
>>
>>
>> My question is:
>>
>> Is this the intended behavior of clang static analyzer when processing
>> SymbolDerived symbols?
>> If so, what should I do to reliably detect the resource leaks like the
>> example in "checkLeak2"?
>>
>> I already tried the method that check the liveness of the parent
>> SymbolConjured symbols of each SymbolDerived symbols. If a SymbolDerived
>> symbol's parent symbol is "dead", it will considered as dead as well.
>> However, this method does not work in the "checkLeak2" example, as the
>> SymbolConjured symbol "ha" is dead after analyzing "release(ha[0])" and
>> before "release(ha[1])", the analyzer will report "ha[1]" as leak because
>> "release(ha[1])" is not analyzed yet and parent symbol of "ha[1]" is
>> already dead and "ha[1]" is not in "Release" state.
>>
>> Thanks for any help,
>>
>> Haowei
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170531/76f22e1e/attachment.html>