[cfe-dev] [analyzer] An in-depth look at Liveness Analysis in the Static Analyzer

Kristóf Umann via cfe-dev cfe-dev at lists.llvm.org
Mon Jul 27 18:17:07 PDT 2020


Hi!

This mail will take a look at LiveVariables, which in reality is a live
values analysis implemented by clang. Written about 2 months after the
clang/ directory itself was created, it plays a crucial rule in the Clang
Static Analyzer to this day. Despite that, I managed to find only 2
functional changes to it in the last 8 years, so we generally consider it
to be stable.

As I wanted to merge the implementations of Liveness and
UninitializedVariables, I ended up struggling a lot to comprehend the
former, so this serves as a documentation for myself but written for
anyone, and I welcome anyone to find errors in my understanding.

---=== 1. What is liveness analysis? ===---

Suppose statement i assigns a value to x . If statement j has x as an
operand, and control can flow from statement i to j along a path that has
no intervening assignments to x , then we say statement j uses the value of
x computed at statement i. We further say that x is live at statement i. [5]

This can be used to spot, for example, to spot dead stores (storing a value
into a variable, but not using it anywhere).

In practice, similarly to other data flow algorithms, we calculate an
in-state and out-state for each basic block in a control flow graph. The
in-state of a block is the set of variables that are live at the start of
the block. Its out-state is the set of variables that are live at the end
of it. The out-state is the union of the in-states of the block's
successors. [8]

We can extend the above described live variable analysis to include values,
this is the so called live expressions analysis [2]. Expressions live only
within the containing statement -- for instance, argument expressions in a
call expressions are live until the end of the call expression itself. If
the expression isn't immediately read, it does not live in any context.

---=== 2. Examples ===---

a.) The following example shows the LIVEin and LIVEout sets (after, not
during calculation): [6]

// in: {}
b1: a = 3;
    b = 5;
    d = 4;
    x = 100; //x is never being used later thus not in the out set {a,b,d}
    if a > b then
// out: {a,b,d}    //union of all (in) successors of b1 => b2: {a,b}, and
b3:{b,d}

// in: {a,b}
b2: c = a + b;
    d = 2;
// out: {b,d}

// in: {b,d}
b3: endif
    c = 4;
    return b * d + c;
// out:{}

b.) Multiple kills of a variable in a single block:

int a = 5; // 'a' is written, can liveness tell whether 'a' is live?
a = 6; // 'a' is written
use(6); // 'a' is used

c.) Statement that spans multiple blocks:

x ? y : z

d.) Expression liveness example:

int getInt();
int getInt2();
void foo(int, int);
void test() {
  foo(getInt(), getInt2());
}

Contents of test()'s main CFGBlock:

1: foo
2: [B1.1] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int, int))
3: getInt
4: [B1.3] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(void))
5: [B1.4]()
6: getInt2
7: [B1.6] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(void))
8: [B1.7]()
9: [B1.2]([B1.5], [B1.8])

Note how, as one would expect, the order of evaluation can be clearly seen
in the elements of the block on a rather fine level.

The relevant (and somewhat simplified) AST:

`-CallExpr 'void'
  |-ImplicitCastExpr 'void (*)(int, int)'
  | `-DeclRefExpr 'foo' 'void (int, int)'
  |-CallExpr 'int'
  | `-ImplicitCastExpr 'int (*)()'
  |   `-DeclRefExpr 'getInt' 'int ()'
  `-CallExpr 'int'
    `-ImplicitCastExpr 'int (*)()'
      `-DeclRefExpr 'getInt2' 'int ()'

---=== 3. How it works in Clang ===---

a.) General

Since liveness analysis is a backwards may analysis [8], each CFGBlock in
the CFG is visited with a worklist that always enqueues predecessors. It
runs a fix-point algorithm, calculating the set of in- and outgoing live
expressions/variables until those sets change no more.

Calculating liveness sets for only blocks isn't granular enough, as a
single block may contain multiple kills (example b.). This demands that
these LIVEin and LIVEout sets are calculated for each statement. However,
since a statement can span across multiple blocks, block-level liveness
needs to be calculated as well (example c.).

Contrary to the Wikipedia algorithm [8] describes, there are no established
GEN and KILL sets, nor similar constructs like a symbol table as described
in the dragon book [5]. Instead, determining which statements constitutes
as variable/expression kill (write), or GENeration (use/read) is done in
the same step as the calculation of the LIVEin and LIVEout sets.

For each block in the worklist, we traverse its CFGElements backwards and
create a TransferFunctions object, which is a statement visitor under the
hood.  For each element in this block that is a statement (CFGStmt), we
visit the statement with said TranserFunctions. We always copy the liveness
set from the previous statement to the next. Wiki reads,

"The transfer function of a statement is applied by making the variables
that are written dead, then making the variables that are read live."[8]

b.) Calculating expression liveness for a single CFGBlock

Since we also calculate live expressions however, we have to also account
for them in addition. If the statement is an Expr, it must also be the kill
of that Expr (it is the only write of that expression, after all), so we
remove it from the live expressions set. Moving on, we branch out --
depending on the type of the statement, we mark things like the condition
of for/if/while/do statements live.

We also employ a heuristic to generalize this task -- we mark all
child expressions live as well. Demonstrating that on the Example d.):

* [B1.9] (Block 1, Element 9): Remove [B1.9] (as it is the CallExpr to foo)
from [B1.9]'s liveness set, add its children, the arguments, [B1.5] and
[B1.8], and the function to be called, [B1.2] (mind that this could be a
function pointer as well!).
* [B1.8]: Remove [B1.8] (as it is the CallExpr to getInt2) from [B1.8]'s
liveness set, add its child, [B1.7].
* [B1.7]: Remove [B1.7] (as it is also an Expr, an ImplicitCastExpr) from
[B1.7]'s liveness set, add its child, [B1.6].
* [B1.6]: Remove [B1.6] (as it is a DeclRefExpr to getInt2) from [B1.6]'s
liveness set, and since it has no more children, we're done here.
* [B1.5] -- [B1.3] same as [B1.8] --[B1.6].
* [B1.2]: Remove [B1.2] (as it is also an ImplicitCastExpr of function foo)
from [B1.7]'s liveness set, add its child, [B1.1].
* [B1.1]: Remove [B1.1] (as it is the DeclRefExpr to function foo) from
[B1.1]'s liveness set, and we're done.

This leaves the following live expression sets (mind that due to the
linearity of the CFG, this is the final result):
* [B1.9]: [B1.8], [B1.5], [B1.2]
* [B1.8]: [B1.5], [B1.2], [B1.7]
* [B1.7]: [B1.5], [B1.2], [B1.6]
* [B1.6]: [B1.5], [B1.2]
* [B1.5]: [B1.2], [B1.4]
* [B1.4]: [B1.2], [B1.3],
* [B1.3]: [B1.2]
* [B1.2]: [B1.1]
* [B1.1]: {}

Indeed, this accurately represents the liveness of these expressions -- the
function to be called, and both arguments must live until the end of the
call expression.

c.) Calculating variable liveness for a single CFGBlock

Reads or usages of variables can be found in many contexts, namely when the
statement visitor encounters a DeclRefExpr, DeclStmt,
BlockExpr, CFGAutomaticObjDtor, or ObjCMessageExpr. What is different
compared to expression live set calculation, are kills, or in other words,
the removal of variables from liveness sets -- writes happen on either
assignments or their definitions. Mind that not all assignments are kills
-- operators such as += also read the variable.

In general, understanding live variables calculation follows the literature
more closely, and in my experience, demands far less time to understand.

d.) Transfer in between blocks

The LIVEin set of a CFGBlock will always be the liveness set (both for
variables and expressions) of the corresponding CFGElement no1. Next, as
the algorithm states, we enqueue all predecessor nodes to the worklist.
Processing the next block in the worklist starts with calculating its
LIVEout set with the union of all liveness sets of successor blocks.

e.) Relaxed liveness analysis

LiveVariables presents the option to conduct a relaxed analysis, no longer
regarding assignments as kills. This is desirable when we're not interested
in a liveness of a value held by a variable, but rather the lifetime of a
variable. As such, the analyzer uses relaxed liveness analysis when
inquiring about variables for the most part with the exception of
DeadStoresChecker. Even debug.DumpLiveStms dumps relaxed sets, but
debug.DumpLiveVars uses the non-relaxed version.

d.) Miscellaneous notes

Live bindings[9] work very similarly to variables, but have their own
distinct liveness set.

There is currently a hack in the analyzer where we assign value to a
statement, and thus Liveness calculates live statements, not expressions
[2], https://reviews.llvm.org/D82598#2171312.

There is a corner case for CFGAutomaticObjDtor, which isn't a CFGStmt, but
is still handled, as it marks its associated variable live.

A notable bug when we heuristic of adding all child expressions (or, as of
writing this mail, still statements) didn't work is described in [3].

---=== 3. How the static analyzer uses it ===---

I'll detail this in a followup mail when and if I get to it. I guess the
most interesting and non-trivial usage of liveness is found
in DeadStoresChecker, which is definitely a good learning example of
dataflow analysis being incorporated into symbolic execution.

If that mail doesn't happen soon enough, [1], [4], [7] provide great
further reading on a related problem, zombie symbols, which is strongly
related to liveness.

Have a great one!
Husi

[1] [cfe-dev] [analyzer] Zombie symbols.
http://lists.llvm.org/pipermail/cfe-dev/2016-March/048142.html
[2] [analyzer][Liveness][NFC] Get rid of statement liveness, because such a
thing doesn't exist https://reviews.llvm.org/D82598
[3] [analyzer] LiveVariables: Fix a zombie expression problem, add testing
infrastructure. https://reviews.llvm.org/D55566
[4] [analyzer] Fix the "Zombie symbols" issue.
https://reviews.llvm.org/D18860
[5] Aho, A., Lam, M., Sethi, R., & Ullman, J. (2007). Compilers:
Principles, Techniques and Tools, 2nd Edition.
[6] Live variable analysis, Wikipedia, second example
https://en.wikipedia.org/wiki/Live_variable_analysis#Second_example
[7] Dergachev, A. (2016). Clang Static Analyzer: A Checker Developer’s
Guide.(2016).
[8] Live variable analysis, Wikipedia, second example
https://en.wikipedia.org/wiki/Live_variable_analysis
<https://en.wikipedia.org/wiki/Live_variable_analysis#Second_example>
[9] https://clang.llvm.org/doxygen/classclang_1_1BindingDecl.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200728/2d1a3eff/attachment-0001.html>


More information about the cfe-dev mailing list