[llvm-dev] RFC: Source-based MC/DC Code Coverage

Tue Nov 9 13:21:08 PST 2021

Last January, I upstreamed support for Source-based Branch Coverage (Revision https://reviews.llvm.org/D84467). I also spoke about this work at the 2020 LLVM Developers' Meeting.  It was a lot of fun to work on!

As I mentioned then, I have had an interest in extending this support to enable Modified Condition/Decision Coverage (MC/DC), and I'd like to start working on that very soon.  Based on what I learned from others, there is a lot of interest in supporting MC/DC with LLVM.  MC/DC is a requirement for functional safety critical development, in particular, in the embedded space.  More about MC/DC: https://en.wikipedia.org/wiki/Modified_condition/decision_coverage

Background

The key thing added by MC/DC (from the DO-178C standard) is to show that, "Each condition in a decision is shown to independently affect the outcome of a decision."   A condition is shown to affect a decision's outcome independently "by varying just that condition while holding fixed all other possible conditions".  An addendum to the DO-178C standard definition clarifies, "or varying just that condition while holding fixed all other possible conditions that could affect the outcome", which allows us to ignore unevaluatable conditions in languages that have short-circuit behavior on logical operations && and || (like C/C++).  This is what I plan to implement.

For a Boolean logical expression consisting of one or more logical operators, the individual condition True/False path of execution constitutes a 'test vector' yielding a decision outcome.  For example, for "a && b", there are three possible test vectors ('-' signifies an unevaluatable 'don't-care' condition due to short-circuit semantics).

   a    b   Result
1: F,   -,  F           {F,-,F}
2: T,   F,  F           {T,F,F}
3: T,   T,  T           {T,T,T}

MC/DC effectively requires N+1 test vectors to verify (where N is the number of conditions). In the example above, all of the test vectors must be executed in order to show that both conditions, a and b, independently affect the outcome.  More precisely, two test vectors are required for each condition to show MC/DC, forming an "independence pair" for each condition (in the above example, 'a' can be shown via test vectors 1 and 3, and 'b' can be shown via test vectors 2 and 3).  The goal for LLVM is to enable llvm-cov to determine the executed test vectors and analyze them for each condition/decision outcome to prove that each condition independently affects the outcome.   The overall metric is simply the number of executed "independence pairs" vs. total expected for each condition.

I'm just sketching out the design. Similar to Branch Coverage, there are three primary areas that I am proposing to extend:

  1.  Extend the notion of "BranchRegion"
     *   When I implemented support for branch coverage, I defined a new BranchRegion region type to encompass a single condition and its True and False counters.  What I'd like to do here is add an ID for the region as well as IDs that correspond to branch regions for each True/False counter (if it exists). What this does is link branch regions together according to an execution path, and this enables llvm-cov to construct the True/False test vectors and the associated execution counts for each path to ascertain whether that test vector was executed. For example:

                                                               i.      Given a Boolean expression like "(a && b && c)", independent branch regions are already created for conditions 'a', 'b', 'c'.    The extension would allow llvm-cov to determine that a true path for 'a' leads to condition 'b', a false path leads to a False decision (short-cutting 'b' and 'c'), etc...

           *   {F, -, -}, {T, T, F}, etc.
     *   I'd like to extend this in a backward-compatible way, so that if a BranchRegion isn't encoded with the IDs, Branch Coverage can still be visualized.

  1.  Counter Instrumentation
     *   For basic cases like 'a && b && c' and "a && (b || c)", no new counter instrumentation is needed.  The existing 'True/False' counts (based on counters or counter expressions) yield test vectors that llvm-cov can check. What's most important are the True/False leaf-node counters in the decision because these indicate whether a full test vector was executed or not. In these basic cases, each condition node has a single parent.

     *   Where things get sticky is with Boolean expressions such as "(a && b) || c" or "(a && b) || (c && d)".  In these cases, the short-circuiting behavior of "a && b" means that two execution paths lead to condition 'c', thereby forming a join point in the control flow.  In these cases, we'll have to introduce a counter for conditions on the right-hand-side of the '||' in order to disambiguate the execution paths that lead to 'c'.

                                                               i.      Note: Unfortunately, the number of new counters here can grow exponentially for really complicated Boolean expressions, "(a && b && c) || (c && d && f)", so this is definitely a downside with this particular design.  However, these complicated expressions are not common in MC/DC code.  The changes I propose are enough to easily target the common cases, but we may be able to mitigate against really complicated cases through different means (e.g. by limiting the forms of conditions/expressions instrumented and issuing a warning).  There is also an effort to reduce the size of counters (we already facilitate this in our downstream compiler). MC/DC will also be disabled by default in clang. I'm open to other suggestions here.

  1.  Visualization using llvm-cov
     *   Given regions (1. above) that identify the execution paths in a Boolean expression and counters that identify what paths were actually executed, llvm-cov will assemble a test vector table and check for "independence pairs" for each condition to prove whether the executed test vectors sufficiently show that "Each condition in a decision is shown to independently affect the outcome of a decision".
     *   It would visualize as something like this in text-mode:

   32|     20|        if (arg1 == 0 || arg2 == 2 || arg2 == 34)
   ------------------
   |  C1 Branch (32:13): [True: 10, False: 10]
   |  C2 Branch (32:26): [True: 5, False: 5]
   |  C3 Branch (32:39): [True: 0, False: 5]
   ------------------
   ------------------
   | Executed Test Vectors:
   |
   |   C1, C2, C3   Res
   |  { T,  -,  -,  = T }
   |  { F,  T,  -,  = T }
   |  { F,  F,  F,  = F }
   |
   |  C1-Pair: covered
   |  C2-Pair: covered
   |  C3-Pair: not covered
   |  MC/DC Coverage for Expression: 67%
   ------------------

     *   Some vendors show all possible test vectors and identify the executed ones.
     *   HTML output can look similar, although I'd still like to improve HTML output for both branch coverage and MC/DC using tooltips.

Alternatives
In looking at what other vendors do, I also looked into some alternative designs that are definitely worth mentioning.

  1.  Log-based trace output
     *   Some vendors track the control-flow through the conditions in a Boolean expression and simply emit log-data during execution.  While this makes the instrumentation very straightforward, particularly for more complicated Boolean expressions, supporting this in LLVM would require llvm-profdata/llvm-cov to understand additional sets of data in addition to counter data and region data.  I'd rather build on what's already supported.

  1.  Counter-based trace output
     *   Another option would be to track the control-flow through the conditions and simply map test-vector execution directly to its own counter.  However, this would require many more additional counters than are required by the design proposed above because each possible test-vector would have a counter.

                                                               i.      We could mitigate this by introducing variable sized counters (since we only need to know whether a test-vector was executed, which only requires one bit), or by mapping test vectors to a bitmap in a single counter, but this undermines the way counters are designed to work in LLVM and would require more extensive work to support in a first-class way (and not a hack). However, if this could be done easily, I'm open to suggestions.

In the end, the design I propose above extends what's already there and targets the common use cases most likely to be encountered for MC/DC.

Please let me know if my design proposal looks reasonable.  I'd like to start implementation and upstream early next year.

Thanks!

Alan Phipps
Texas Instruments, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211109/f81272fa/attachment.html>