[PATCH] D20715: [docs] Document the source-based code coverage feature

Wed Jun 1 17:58:41 PDT 2016

> On Jun 1, 2016, at 11:30 AM, Justin Bogner <mail at justinbogner.com> wrote:
> 
> Vedant Kumar <vsk at apple.com> writes:
>> vsk created this revision.
>> vsk added a reviewer: bogner.
>> vsk added subscribers: kcc, cfe-commits, silvas.
>> 
>> It would be helpful to have a user-friendly guide for code
>> coverage. There is some overlap with [1], but this document visits
>> issues which may affect users in more depth.
>> 
>> Prompted by: https://llvm.org/bugs/show_bug.cgi?id=27781
>> 
>> [1] http://llvm.org/docs/CoverageMappingFormat.html
> ...
>> vsk updated this revision to Diff 59078.
>> vsk marked an inline comment as done.
>> vsk added a comment.
>> 
>> - Actually link in the new document into Index.rst.
> 
> A couple of comments below, but since this is prose it's basically all
> just opinion. Feel free to commit any time - we can always make
> improvements in tree later.

Thanks for all the feedback! I've incorporated all of it with minor
changes.

Let's use r271454 as a starting point, then.

vedant

> 
>> 
>> http://reviews.llvm.org/D20715
>> 
>> Files:
>>  docs/SourceBasedCodeCoverage.rst
>>  docs/index.rst
>> 
>> Index: docs/index.rst
>> ===================================================================
>> --- docs/index.rst
>> +++ docs/index.rst
>> @@ -33,6 +33,7 @@
>>    ControlFlowIntegrity
>>    LTOVisibility
>>    SafeStack
>> +   SourceBasedCodeCoverage
>>    Modules
>>    MSVCCompatibility
>>    CommandGuide/index
>> Index: docs/SourceBasedCodeCoverage.rst
>> ===================================================================
>> --- /dev/null
>> +++ docs/SourceBasedCodeCoverage.rst
>> @@ -0,0 +1,187 @@
>> +==========================
>> +Source-based Code Coverage
>> +==========================
>> +
>> +.. contents::
>> +   :local:
>> +
>> +Introduction
>> +============
>> +
>> +This document explains how to use clang's source-based code coverage feature.
>> +It's called "source-based" because it operates on AST and preprocessor
>> +information directly. This allows it to generate very precise coverage data.
>> +
>> +Clang ships two other code coverage implementations:
>> +
>> +* :doc:`SanitizerCoverage` - A low-overhead tool meant for use alongside the
>> +  various sanitizers. It can provide up to edge-level coverage.
>> +
>> +* gcov - A GCC-compatible coverage implementation which operates on DebugInfo.
>> +
>> +From this point onwards "code coverage" will refer to the source-based kind.
>> +
>> +The code coverage workflow
>> +==========================
>> +
>> +The code coverage workflow consists of three main steps:
>> +
>> +1. Compiling with coverage enabled.
>> +
>> +2. Running the instrumented program.
>> +
>> +3. Creating coverage reports.
>> +
>> +The next few sections work through a complete, copy-'n-paste friendly example
>> +based on this program:
>> +
>> +.. code-block:: console
>> +
>> +    % cat <<EOF > foo.cc
>> +    #define BAR(x) ((x) || (x))
>> +    template <typename T> void foo(T x) {
>> +      for (unsigned I = 0; I < 10; ++I) { BAR(I); }
>> +    }
>> +    int main() {
>> +      foo<int>(0);
>> +      foo<float>(0);
>> +      return 0;
>> +    }
>> +    EOF
>> +
>> +Compiling with coverage enabled
>> +===============================
>> +
>> +To compile code with coverage enabled pass ``-fprofile-instr-generate
>> +-fcoverage-mapping`` to the compiler:
>> +
>> +.. code-block:: console
>> +
>> +    # Step 1: Compile with coverage enabled.
>> +    % clang++ -fprofile-instr-generate -fcoverage-mapping foo.cc -o foo
>> +
>> +Note that linking together code with and without coverage instrumentation is
>> +supported: any uninstrumented code simply won't be accounted for.
>> +
>> +Running the instrumented program
>> +================================
>> +
>> +The next step is to run the instrumented program. When the program exits it
>> +will write a **raw profile** to the path specified by the ``LLVM_PROFILE_FILE``
>> +environment variable. If that variable does not exist the profile is written to
>> +``./default.profraw``.
> 
> Something like "``default.profraw`` in the current directory of the
> program" might be clearer.
> 
>> +
>> +If ``LLVM_PROFILE_FILE`` contains a path to a non-existent directory the
>> +missing directory structure will be created.  Additionally, the following
>> +special **pattern strings** are replaced:
>> +
>> +* "%p" expands out to the PID.
> 
> Maybe spell out process ID?
> 
>> +
>> +* "%h" expands out to the hostname of the machine running the program.
>> +
>> +.. code-block:: console
>> +
>> +    # Step 2: Run the program.
>> +    % LLVM_PROFILE_FILE="foo.profraw" ./foo
>> +
>> +Creating coverage reports
>> +=========================
>> +
>> +Raw profiles have to be **indexed** before they can be used to generated
>> +coverage reports:
>> +
>> +.. code-block:: console
>> +
>> +    # Step 3(a): Index the raw profile.
>> +    % llvm-profdata merge -sparse foo.profraw -o foo.profdata
>> +
>> +You may be wondering why raw profiles aren't indexed automatically. In
>> +real-world projects multiple profiles have to be merged together before a
>> +report can be generated. This merge step is inevitable, so it makes sense to
>> +handle the compute-intensive indexing process at that point. A separate
>> +indexing step has the added benefit of keeping the compiler runtime small and
>> +simple.
> 
> This comment feels like it comes out of nowhere, and I'm not really
> convinced it adds much. If you feel strongly that we need to ward off
> "why do I need to do this if I only have one raw profile" questions, I'd
> probably phrase it something like this:
> 
>  This merge step is necessary even if you only have one raw profile file.
>  Separating the indexing step like this allows the compiler runtime to be
>  efficient and simple, whereas most users of coverage will need to merge
>  multiple profiles anyway.
> 
>> +
>> +There are multiple different ways to render coverage reports. One option is to
>> +generate a line-oriented report:
>> +
>> +.. code-block:: console
>> +
>> +    # Step 3(b): Create a line-oriented coverage report.
>> +    % llvm-cov show ./foo -instr-profile=foo.profdata
>> +
>> +This report includes a summary view as well as dedicated sub-views for
>> +templated functions and their instantiations. For our example program, we get
>> +distinct views for ``foo<int>(...)`` and ``foo<float>(...)``.  If
>> +``-show-line-counts-or-regions`` is enabled, ``llvm-cov`` displays sub-line
>> +region counts (even in macro expansions):
>> +
>> +.. code-block:: console
>> +
>> +       20|    1|#define BAR(x) ((x) || (x))
>> +                               ^20     ^2
>> +        2|    2|template <typename T> void foo(T x) {
>> +       22|    3|  for (unsigned I = 0; I < 10; ++I) { BAR(I); }
>> +                                       ^22     ^20  ^20^20
>> +        2|    4|}
>> +    ------------------
>> +    | _Z3fooIiEvT_:
> 
> People always ask about the C++ mangled names. Maybe we should mention
> the option of running the results through c++filt? You'd probably also
> want to mention the -use-color flag, since piping the output to c++filt
> will disable the automatic detection. 
> 
> Personally, I find the following command useful enough to mention
> briefly:
> 
>  llvm-cov show -use-color <...> | c++filt | less -R
> 
>> +    |      1|    2|template <typename T> void foo(T x) {
>> +    |     11|    3|  for (unsigned I = 0; I < 10; ++I) { BAR(I); }
>> +    |                                     ^11     ^10  ^10^10
>> +    |      1|    4|}
>> +    ------------------
>> +    | _Z3fooIfEvT_:
>> +    |      1|    2|template <typename T> void foo(T x) {
>> +    |     11|    3|  for (unsigned I = 0; I < 10; ++I) { BAR(I); }
>> +    |                                     ^11     ^10  ^10^10
>> +    |      1|    4|}
>> +    ------------------
>> +
>> +It's possible to generate a file-level summary of coverage statistics (instead
>> +of a line-oriented report) with:
>> +
>> +.. code-block:: console
>> +
>> +    # Step 3(c): Create a coverage summary.
>> +    % llvm-cov report ./foo -instr-profile=foo.profdata
>> +    Filename                    Regions    Miss   Cover Functions  Executed
>> +    -----------------------------------------------------------------------
>> +    /tmp/foo.cc                      13       0 100.00%         3   100.00%
>> +    -----------------------------------------------------------------------
>> +    TOTAL                            13       0 100.00%         3   100.00%
>> +
>> +A few final notes:
>> +
>> +* The ``-sparse`` flag is optional but can result in dramatically smaller
>> +  indexed profiles. This option should not be used if the indexed profile will
>> +  be reused for PGO.
>> +
>> +* Raw profiles can be discarded after they are indexed. Advanced use of the
>> +  profile runtime library allows an instrumented program to merge profiling
>> +  information directly into an existing raw profile on disk. The details are
>> +  out of scope.
>> +
>> +* The ``llvm-profdata`` tool can be used to merge together multiple raw or
>> +  indexed profiles. To combine profiling data from multiple runs of a program,
>> +  try e.g:
>> +
>> +.. code-block:: console
>> +
>> +    % llvm-profdata merge -sparse foo1.profraw foo2.profdata -o foo3.profdata
>> +
>> +Format compatibility guarantees
>> +===============================
>> +
>> +* There are no backwards or forwards compatibility guarantees for the raw
>> +  profile format. Raw profiles may be dependent on the specific compiler
>> +  revision used to generate them. It's inadvisable to store raw profiles for
>> +  long periods of time.
>> +
>> +* Tools must retain **backwards** compatibility with indexed profile formats.
>> +  These formats are not forwards-compatible: i.e, a tool which uses format
>> +  version X will not be able to understand format version (X+k).
>> +
>> +* There is a third format in play: the format of the coverage mappings emitted
>> +  into instrumented binaries. Tools must retain **backwards** compatibility
>> +  with these formats. These formats are not forwards-compatible.
>