[cfe-dev] [analyzer] An in-depth look at Liveness Analysis in the Static Analyzer

Thu Jul 30 06:41:51 PDT 2020

Kristóf,

This is really useful for ClangSA developers. I am wondering if this should
be placed somewhere in the official documentation?

> ---=== 3. How the static analyzer uses it ===---
> I'll detail this in a followup mail when and if I get to it.

Can't wait to read that! Perhaps that should be part of the above mentioned
docs too :)

Gábor

On Tue, Jul 28, 2020 at 3:17 AM Kristóf Umann <dkszelethus at gmail.com> wrote:

> Hi!
>
> This mail will take a look at LiveVariables, which in reality is a live
> values analysis implemented by clang. Written about 2 months after the
> clang/ directory itself was created, it plays a crucial rule in the Clang
> Static Analyzer to this day. Despite that, I managed to find only 2
> functional changes to it in the last 8 years, so we generally consider it
> to be stable.
>
> As I wanted to merge the implementations of Liveness and
> UninitializedVariables, I ended up struggling a lot to comprehend the
> former, so this serves as a documentation for myself but written for
> anyone, and I welcome anyone to find errors in my understanding.
>
> ---=== 1. What is liveness analysis? ===---
>
> Suppose statement i assigns a value to x . If statement j has x as an
> operand, and control can flow from statement i to j along a path that has
> no intervening assignments to x , then we say statement j uses the value of
> x computed at statement i. We further say that x is live at statement i. [5]
>
> This can be used to spot, for example, to spot dead stores (storing a
> value into a variable, but not using it anywhere).
>
> In practice, similarly to other data flow algorithms, we calculate an
> in-state and out-state for each basic block in a control flow graph. The
> in-state of a block is the set of variables that are live at the start of
> the block. Its out-state is the set of variables that are live at the end
> of it. The out-state is the union of the in-states of the block's
> successors. [8]
>
> We can extend the above described live variable analysis to include
> values, this is the so called live expressions analysis [2]. Expressions
> live only within the containing statement -- for instance, argument
> expressions in a call expressions are live until the end of the call
> expression itself. If the expression isn't immediately read, it does not
> live in any context.
>
> ---=== 2. Examples ===---
>
> a.) The following example shows the LIVEin and LIVEout sets (after, not
> during calculation): [6]
>
> // in: {}
> b1: a = 3;
>     b = 5;
>     d = 4;
>     x = 100; //x is never being used later thus not in the out set {a,b,d}
>     if a > b then
> // out: {a,b,d}    //union of all (in) successors of b1 => b2: {a,b}, and
> b3:{b,d}
>
> // in: {a,b}
> b2: c = a + b;
>     d = 2;
> // out: {b,d}
>
> // in: {b,d}
> b3: endif
>     c = 4;
>     return b * d + c;
> // out:{}
>
> b.) Multiple kills of a variable in a single block:
>
> int a = 5; // 'a' is written, can liveness tell whether 'a' is live?
> a = 6; // 'a' is written
> use(6); // 'a' is used
>
> c.) Statement that spans multiple blocks:
>
> x ? y : z
>
> d.) Expression liveness example:
>
> int getInt();
> int getInt2();
> void foo(int, int);
> void test() {
>   foo(getInt(), getInt2());
> }
>
> Contents of test()'s main CFGBlock:
>
> 1: foo
> 2: [B1.1] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int, int))
> 3: getInt
> 4: [B1.3] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(void))
> 5: [B1.4]()
> 6: getInt2
> 7: [B1.6] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(void))
> 8: [B1.7]()
> 9: [B1.2]([B1.5], [B1.8])
>
> Note how, as one would expect, the order of evaluation can be clearly seen
> in the elements of the block on a rather fine level.
>
> The relevant (and somewhat simplified) AST:
>
> `-CallExpr 'void'
>   |-ImplicitCastExpr 'void (*)(int, int)'
>   | `-DeclRefExpr 'foo' 'void (int, int)'
>   |-CallExpr 'int'
>   | `-ImplicitCastExpr 'int (*)()'
>   |   `-DeclRefExpr 'getInt' 'int ()'
>   `-CallExpr 'int'
>     `-ImplicitCastExpr 'int (*)()'
>       `-DeclRefExpr 'getInt2' 'int ()'
>
> ---=== 3. How it works in Clang ===---
>
> a.) General
>
> Since liveness analysis is a backwards may analysis [8], each CFGBlock in
> the CFG is visited with a worklist that always enqueues predecessors. It
> runs a fix-point algorithm, calculating the set of in- and outgoing live
> expressions/variables until those sets change no more.
>
> Calculating liveness sets for only blocks isn't granular enough, as a
> single block may contain multiple kills (example b.). This demands that
> these LIVEin and LIVEout sets are calculated for each statement. However,
> since a statement can span across multiple blocks, block-level liveness
> needs to be calculated as well (example c.).
>
> Contrary to the Wikipedia algorithm [8] describes, there are no
> established GEN and KILL sets, nor similar constructs like a symbol table
> as described in the dragon book [5]. Instead, determining which statements
> constitutes as variable/expression kill (write), or GENeration (use/read)
> is done in the same step as the calculation of the LIVEin and LIVEout sets.
>
> For each block in the worklist, we traverse its CFGElements backwards and
> create a TransferFunctions object, which is a statement visitor under the
> hood.  For each element in this block that is a statement (CFGStmt), we
> visit the statement with said TranserFunctions. We always copy the liveness
> set from the previous statement to the next. Wiki reads,
>
> "The transfer function of a statement is applied by making the variables
> that are written dead, then making the variables that are read live."[8]
>
> b.) Calculating expression liveness for a single CFGBlock
>
> Since we also calculate live expressions however, we have to also account
> for them in addition. If the statement is an Expr, it must also be the kill
> of that Expr (it is the only write of that expression, after all), so we
> remove it from the live expressions set. Moving on, we branch out --
> depending on the type of the statement, we mark things like the condition
> of for/if/while/do statements live.
>
> We also employ a heuristic to generalize this task -- we mark all
> child expressions live as well. Demonstrating that on the Example d.):
>
> * [B1.9] (Block 1, Element 9): Remove [B1.9] (as it is the CallExpr to
> foo) from [B1.9]'s liveness set, add its children, the arguments, [B1.5]
> and [B1.8], and the function to be called, [B1.2] (mind that this could be
> a function pointer as well!).
> * [B1.8]: Remove [B1.8] (as it is the CallExpr to getInt2) from [B1.8]'s
> liveness set, add its child, [B1.7].
> * [B1.7]: Remove [B1.7] (as it is also an Expr, an ImplicitCastExpr) from
> [B1.7]'s liveness set, add its child, [B1.6].
> * [B1.6]: Remove [B1.6] (as it is a DeclRefExpr to getInt2) from [B1.6]'s
> liveness set, and since it has no more children, we're done here.
> * [B1.5] -- [B1.3] same as [B1.8] --[B1.6].
> * [B1.2]: Remove [B1.2] (as it is also an ImplicitCastExpr of function
> foo) from [B1.7]'s liveness set, add its child, [B1.1].
> * [B1.1]: Remove [B1.1] (as it is the DeclRefExpr to function foo) from
> [B1.1]'s liveness set, and we're done.
>
> This leaves the following live expression sets (mind that due to the
> linearity of the CFG, this is the final result):
> * [B1.9]: [B1.8], [B1.5], [B1.2]
> * [B1.8]: [B1.5], [B1.2], [B1.7]
> * [B1.7]: [B1.5], [B1.2], [B1.6]
> * [B1.6]: [B1.5], [B1.2]
> * [B1.5]: [B1.2], [B1.4]
> * [B1.4]: [B1.2], [B1.3],
> * [B1.3]: [B1.2]
> * [B1.2]: [B1.1]
> * [B1.1]: {}
>
> Indeed, this accurately represents the liveness of these expressions --
> the function to be called, and both arguments must live until the end of
> the call expression.
>
> c.) Calculating variable liveness for a single CFGBlock
>
> Reads or usages of variables can be found in many contexts, namely when
> the statement visitor encounters a DeclRefExpr, DeclStmt,
> BlockExpr, CFGAutomaticObjDtor, or ObjCMessageExpr. What is different
> compared to expression live set calculation, are kills, or in other words,
> the removal of variables from liveness sets -- writes happen on either
> assignments or their definitions. Mind that not all assignments are kills
> -- operators such as += also read the variable.
>
> In general, understanding live variables calculation follows the
> literature more closely, and in my experience, demands far less time to
> understand.
>
> d.) Transfer in between blocks
>
> The LIVEin set of a CFGBlock will always be the liveness set (both for
> variables and expressions) of the corresponding CFGElement no1. Next, as
> the algorithm states, we enqueue all predecessor nodes to the worklist.
> Processing the next block in the worklist starts with calculating its
> LIVEout set with the union of all liveness sets of successor blocks.
>
> e.) Relaxed liveness analysis
>
> LiveVariables presents the option to conduct a relaxed analysis, no longer
> regarding assignments as kills. This is desirable when we're not interested
> in a liveness of a value held by a variable, but rather the lifetime of a
> variable. As such, the analyzer uses relaxed liveness analysis when
> inquiring about variables for the most part with the exception of
> DeadStoresChecker. Even debug.DumpLiveStms dumps relaxed sets, but
> debug.DumpLiveVars uses the non-relaxed version.
>
> d.) Miscellaneous notes
>
> Live bindings[9] work very similarly to variables, but have their own
> distinct liveness set.
>
> There is currently a hack in the analyzer where we assign value to a
> statement, and thus Liveness calculates live statements, not expressions
> [2], https://reviews.llvm.org/D82598#2171312.
>
> There is a corner case for CFGAutomaticObjDtor, which isn't a CFGStmt, but
> is still handled, as it marks its associated variable live.
>
> A notable bug when we heuristic of adding all child expressions (or, as of
> writing this mail, still statements) didn't work is described in [3].
>
> ---=== 3. How the static analyzer uses it ===---
>
> I'll detail this in a followup mail when and if I get to it. I guess the
> most interesting and non-trivial usage of liveness is found
> in DeadStoresChecker, which is definitely a good learning example of
> dataflow analysis being incorporated into symbolic execution.
>
> If that mail doesn't happen soon enough, [1], [4], [7] provide great
> further reading on a related problem, zombie symbols, which is strongly
> related to liveness.
>
> Have a great one!
> Husi
>
> [1] [cfe-dev] [analyzer] Zombie symbols.
> http://lists.llvm.org/pipermail/cfe-dev/2016-March/048142.html
> [2] [analyzer][Liveness][NFC] Get rid of statement liveness, because such
> a thing doesn't exist https://reviews.llvm.org/D82598
> [3] [analyzer] LiveVariables: Fix a zombie expression problem, add testing
> infrastructure. https://reviews.llvm.org/D55566
> [4] [analyzer] Fix the "Zombie symbols" issue.
> https://reviews.llvm.org/D18860
> [5] Aho, A., Lam, M., Sethi, R., & Ullman, J. (2007). Compilers:
> Principles, Techniques and Tools, 2nd Edition.
> [6] Live variable analysis, Wikipedia, second example
> https://en.wikipedia.org/wiki/Live_variable_analysis#Second_example
> [7] Dergachev, A. (2016). Clang Static Analyzer: A Checker Developer’s
> Guide.(2016).
> [8] Live variable analysis, Wikipedia, second example
> https://en.wikipedia.org/wiki/Live_variable_analysis
> <https://en.wikipedia.org/wiki/Live_variable_analysis#Second_example>
> [9] https://clang.llvm.org/doxygen/classclang_1_1BindingDecl.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200730/00a089df/attachment-0001.html>