[cfe-dev] [StaticAnalyzer] Questions about SymbolDerived class and checkDeadSymbol callback

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Wed May 31 05:51:19 PDT 2017


Before anything - my blind guess is that you've found another instance 
of the "zombie symbols" bug, could you try if any of the following 
solutions helps:

(1) could you see if changing "SymReaper.isDead(Sym)" to 
"!SymReaper.isLive(Sym)" helps?
(2) or, instead of (1), could you implement checkLiveSymbols() callback 
in your checker and blindly say SymReaper.maybeDead(Sym) on all symbols 
your checker tracks?

Long story short, there's currently a counter-intuitive behavior in 
SymbolReaper, which we discovered recently but didn't come up with a 
good fix, due to which dead symbols are a different notion from non-live 
symbols. Generally, dead symbols are symbols that were explicitly 
released during dead symbol collection by one of the program state's 
entities (Environment, Store, Constraints, or one of the checkers 
through checkLiveSymbols() callback) - it's a plain finite set of symbol 
inside the SymbolReaper object, while non-live symbols are symbols that 
could not be proven live anymore by deep-scanning the current program 
state, regardless of whether someone has explicitly released them (and 
the set of such non-live symbols might essentially be infinite; however 
isLive() is a relatively fast operation since scan results are already 
pre-computed).

When you obtained derived symbols for ha[0] and ha[1], only your checker 
knows that these symbols have been constructed and are being tracked. 
Technically, these symbols were constructed by StoreManager, but he 
doesn't record them in the store, because he doesn't need to do that to 
provide them again. Similarly to SymbolRegionValue, SymbolDerived would 
be produced automatically as long as the default-bound conjured symbol 
is there, so the Store doesn't need to explicitly remember that 
SymbolDerived is "bound" to the array elements. So your checker is the 
only entity in the program state that is aware of these symbols. And 
therefore there's nobody who could add these symbols to the dead set for 
you to query. So you can either add them to the dead set by trick (2) 
(if they're proven live they won't be added to the dead set) or by trick 
(1) don't rely on the dead set.

SimpleStreamChecker is known to be broken in a similar manner. We 
noticed this before when we figured out that he doesn't find problems of 
form

   FILE *fp = fopen("foo.txt", "r");
   fp = 0;
   return;

where the tracked value is overwritten. The reason is similar: once the 
value is overwritten in the Store, it's no longer visible to any entitiy 
but the checker, and there's nobody to add it to the dead set.

Most of the checkers are unaffected though, because symbols they're 
interested in are also tracked in other parts of the program state. For 
instance, pointers returned by malloc() are known to have a known 
region-extent and default-bound to UndefinedVal, so they're tracked by 
store and constraint manager. Considering that we don't have an awesome 
short-term solution, it seems that we have to advice checker developers 
to take steps (1) or (2) as a workaround.

Additional info:
- http://lists.llvm.org/pipermail/cfe-dev/2016-March/047922.html
- https://reviews.llvm.org/D18860

Another affected user:
- https://reviews.llvm.org/D29419


31/05/2017 4:07 AM, Haowei Wu via cfe-dev wrote:
> Hi
>
> I am developing a resource leak checker that is very similar to 
> SimpleStreamChecker and I noticed an issue when tracking symbols that 
> is SymbolDerived type.
>
> The source code of my checker can be downloaded at this link: 
> https://gist.github.com/zeroomega/2fcabf5979a0e42db1ccb8f30472d071
>
> An example of resource leaks that my checker is trying to find is like 
> this:
>
> typedef uint32_t handle_t;
> typedef uint32_t status_t;
>
> status_t request(
>   uint32_t flags,
>   handle_t* out0,
>   handle_t* out1);
>
> status_t release(handle_t handle);
>
> void checkLeak1(int tag) {
>   handle_t sa, sb;
>   request(0, &sa, &sb);
>   if (tag > 0) {
>     return;
>   }
>   release(sa);
>   release(sb);
> }
>
>
> Function call "request(0, &sa, &sb)" will allocate 2 resource 
> descriptors (handles) to "sa" and "sb". The resource will be leaked if 
> someone forgot to call "release(sa)" on a resource descriptor (handle).
>
> In my checker, I override "checkPostCall" callback to map the 
> SymbolRef of arguments "sa" and "sb" in any "request(0, &sa, &sb)" 
> call to "Allocated" state. And I use "checkPreCall" callback to map 
> the SymbolRef of argument "sa" in any "release(sa)" call to "Released" 
> state. If a tracked SymbolRef is dead when using 
> "SymbolReaper.isDead()" API in "checkDeadSymbol" callback and its 
> state is "Allocated", this SymbolRef will be considered as leaked and 
> will be reported.
>
> In the example code "checkLeak1" , there is an early return "if (tag > 
> 0) { return;}" that will not release any handles, which will cause 
> leaks. My checker can successfully detect this leak. However, when I 
> change the leak example to this:
>
> void checkLeak2(int tag) {
>   handle_t ha[2];
>   request(0, ha, ha+1);
>   if (tag > 0) {
>     return;
>   }
>   release(ha[0]);
>   release(ha[1]);
> }
>
>
> The checker will not report any leaks at all. Further debugging shows 
> that the static analyzer allocates a SymbolConjured symbol for array 
> "ha" and SymbolDerived symbols for item "ha[0]' and "ha[1]". These two 
> SymbolDerived symbols has parent symbol points to SymbolConjured 
> symbol of "ha". In the path of "if (tag > 0) { return;}", the 
> SymbolDerived symbols of "ha[0]" and "ha[1]" are not considered dead 
> when using "SymbolReaper.isDead()". So no leak is reported in this 
> path. These two SymbolDerived symbols are considered dead by 
> "SymbolReaper.isDead()" at the end of path 
> "release(ha[0]);release(ha[1]);".
>
>
> Another two examples can show this behavior more clearly:
>
> void checkLeak3(int tag) {
>   handle_t ha[2];
>   request(0, ha, ha+1);
>   return;
> }
>
> The checker will not report any leak on the code of "checkLeak3" 
> example because the SymbolDerived symbols of "ha[0]" and "ha[1]" are 
> not considered dead when "checkDeadSymbol" callback is invoked at 
> "return;" But if I add two fake assignments like this:
>
> void checkLeak4(int tag) {
>   handle_t ha[2];
>   request(0, ha, ha+1);
>   ha[0] = *ha;
>   ha[1] = *(ha+1);
>   return;
> }
>
> The checker will report leaks on "ha[0]" and "ha[1]" because the 
> SymbolDerived symbols of "ha[0]" and "ha[1]" are considered as dead 
> this time.
>
>
> My question is:
>
> Is this the intended behavior of clang static analyzer when processing 
> SymbolDerived symbols?
> If so, what should I do to reliably detect the resource leaks like the 
> example in "checkLeak2"?
>
> I already tried the method that check the liveness of the parent 
> SymbolConjured symbols of each SymbolDerived symbols. If a 
> SymbolDerived symbol's parent symbol is "dead", it will considered as 
> dead as well. However, this method does not work in the "checkLeak2" 
> example, as the SymbolConjured symbol "ha" is dead after analyzing 
> "release(ha[0])" and before "release(ha[1])", the analyzer will report 
> "ha[1]" as leak because "release(ha[1])" is not analyzed yet and 
> parent symbol of "ha[1]" is already dead and "ha[1]" is not in 
> "Release" state.
>
> Thanks for any help,
>
> Haowei
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




More information about the cfe-dev mailing list