[cfe-dev] [analyzer] Zombie symbols.

Anna Zaks via cfe-dev cfe-dev at lists.llvm.org
Mon Mar 28 12:07:04 PDT 2016


Artem,

You might want to store this in the repo by adding a document or extending this one:
./docs/analyzer/RegionStore.txt

Anna.
> On Mar 28, 2016, at 5:00 AM, Artem Dergachev via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> > I really want to spend some time here, but that code is pretty
> > difficult to understand (as you noted)
> 
> Umm, before i forget, i guess i'd dump some quick explanation of what i figured out about it.
> 
> First, we have symbols-or-regions, hereinafter "values" unless it causes confusion. Liveness of some values may be determined regardless of any other values: for example, a VarRegion of a local variable can be proven live via live variables analysis, which is an auxiliary data flow analysis that doesn't depend on our symbolic execution engine. However, values may also be related to other values via a "keeps-alive relation". There are two kind of keep-alive relations:
> 
> I. "first-class" keep-alive relations arise from and rely purely on the SVal hierarchy (for example, SymbolRegionValue is kept alive by its parent region, SymbolicRegion keeps its base symbol alive).
> 
> II. "second-class" keep-alive relations arise when symbols and regions appear in certain parts of the program state (for example, any region would keep alive its contents - all symbols stored inside it).
> 
> The purpose of removeDead() is to understand which values present in the current program state are dead (and, as a useful by-product, which values are live) by exploring the keep-alive relations between values; second-class relations are explored with respect to the current program state. Whenever a certain value X is determined to be live, any values kept alive by X are also known to be live. The liveness information is collected from scratch on every pass and does not depend on previous removeDead() runs. Once the liveness information is collected, a special action is taken on all dead values, which may involve removing them from the state, but does not necessarily boil down to just that.
> 
> SymbolReaper is an object that contains the incomplete understanding of live values on the mark phase, which eventually becomes complete and as such can be used on the sweep phase. SymbolReaper's data consists of two sets of values - the "live set" and the "dead set". The live set contains values already known to be live. The dead set contains the values found in the state, liveness of which is explicitly questioned and was not yet determined; values may move from the dead set to the live set (if they turn out to be live due to a previously unconsidered second-class relation to a live value) but not in the opposite direction. First-class relations are mostly hardcoded directly into the SymbolReaper - see isLive(), isLiveRegion(). Originally, SymbolReaper was simply a pair of sets, but later it was extended to take care of first-class keep-alive relations.
> 
> The removeDead() procedure begins by creating a temporary SymbolReaper object that lives for as long as the procedure runs. Then it proceeds with querying different manager objects responsible for different sections of the program state to explore their section and fill-in the liveness information in the SymbolReaper. Each section needs to classify each value it is responsible for as live or dead.
> 
> Program state sections are queried in the following order:
> 
> (1) GDM traits of particular checkers (through CheckerManager and checkLiveSymbols() callback).
> (2) Environment (through EnvironmentManager).
> (3) Store (through StoreManager).
> (4) Range constraints (through ConstraintManager).
> 
> Section (1) is implemented in each checker in a different manner. The checker needs to be aware of the fact that if it doesn't markLive() a value here, then all information related to it may be removed from other sections of the state (eg. range constraints for symbols or bindings for regions). And if the checker doesn't mark any value maybeDead(), it may never receive the callback when this value actually dies.
> 
> /*
> [Q1]: In http://reviews.llvm.org/D12726 you mentioned that checkers mark liveness after the store, but now you say it's just the opposite, how so?
> [A1]: I was wrong and stay corrected. It seems that checkers do not have access to any liveness information obtained from the rest of the state for during checkLiveSymbols().
> */
> 
> Sections (2), (3), (4) combine(!!) their sweep phase with the mark phase. However, for checker traits, the sweep phase is delayed until the checkDeadSymbols() checker callback, which goes last at the end of removeDead() after (4).
> 
> /*
> [Q2]: Does taint information for dead symbols ever get removed? Cause it's in GDM, but it's neither checkers nor range constraints.
> [A2]: Afaik, no; we don't even have State::removeTaint() yet. This is a bug, which may probably cause performance problems with taint analysis, though maybe not too much, because tainted symbols are rare.

Taint has never been fully implemented. I also want to investigate using flow-sensitive analysis for taint tracking instead of the approach the current half-implementation uses.

> */
> 
> Section (2) scans active expressions, marks values of active expressions as live. It also marks values of inactive expressions that are still found in the state as dead and removes these expressions and their values from the environment. It does not need to rely on any liveness information for values to make such decision, which is why this step is completely independent and isolated. Note, hovewer, that first-class relations play an important role here - any values found inside values that were marked dead or live are also explicitly marked dead or live, which contributes to complexity of the code. The visitor used here to conduct marking of sub-values is not re-used.
> 
> Section (3) is the most complex here. It takes care of the following second-class keep-alive relation: for every binding in the store, the binding key keeps alive the binding value. Both binding keys and binding values are (at least, partially/sometimes) "values" (in the symbols-or-regions sense), hence it may happen that a certain binding value X is also a binding key for another binding value Y, and once liveness of X is established, liveness of Y needs to be re-calculated. Liveness of binding pairs cannot thus be computed in a single pass through the store, and therefore a worklist-like algorithm is implemented, that tries to reach a fixed point in which the liveness information no longer gets updated. Once the worklist algorithm finishes, the following happens: all bindings with dead keys are removed from the store, all sub-values of binding keys and all binding values and their sub-values are marked as maybe-dead. The visiting algorithm for sub-values is not re-used from the Environment.
> 
> The method assumes that after the end of (3) no new live marks are added. Which is why Store is capable of removing dead bindings without requiring a separate pass.
> 
> Section (4) is trivial - the constraint manager does not mark any symbols live. It marks all constraint keys that were not previously proven live as maybe-dead and removes them from the constraint map.
> 
> /*
> [Q3]: So what if we try to fix the issue mentioned in [Q1] by putting (1) after (4)?
> [A3]: It's more complicated than that. Checkers may introduce their own second-class keep-alive relations between values. For example, CStringChecker would keep string length alive for as long as the string region is alive. There may be more such relations introduced by other checkers. It may happen that a certain checker-specific second-class keep-alive relation would cause a store binding key to become live. Hence, there needs to be a single worklist algorithm run for both store-related second-class keep-alive relations and checker-specific second-class keep-alive relations, and (1) needs to be essentially merged with (4). And that would make things yet even more complex.
> */
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160328/41af3453/attachment.html>


More information about the cfe-dev mailing list