[cfe-dev] Proposal: add instrumentation for PGO and code coverage

Fri Sep 6 19:03:45 PDT 2013

I wonder if there's a way to take advantage of Clang/LLVM's library-based
design to achieve this functionality in a less invasive way than "directly
code in all the necessary logic into various relevant parts of clang's
IRGen".

Off the top of my head, how far could you get with an LLVM pass that
consumes debug info? That would keep the logic nice and separated.

Failing a solution with existing means, maybe it's time for clang to grow
some "pass"-like functionality, so that every new instrumentation that
needs AST information can coexist in peace without incrementally adding
complexity to IRGen. I think a lot of ground could be covered by a two-part
scheme where a "clang pass" can look at the AST and tell IRGen to attach
annotations to the code generated from particular AST nodes, coupled with a
corresponding IR pass that interprets those annotations and transforms the
IR.

Random thing I saw when looking at the patches:

+typedef unsigned int uint32_t;
+typedef unsigned int uint64_t;

This seems super sketchy.

-- Sean Silva

On Fri, Sep 6, 2013 at 6:32 PM, Bob Wilson <bob.wilson at apple.com> wrote:

> I've been investigating better support for both profile-guided
> optimization (PGO) and code coverage in Clang/LLVM.  I've spent some time
> experimenting with several options and have arrived at this proposal.  I'd
> like to get feedback on the overall approach before I start submitting
> patches.  There is a proof-of-concept patch attached here so you can see
> the details, but it needs more testing and cleanup before it is ready to
> commit.
>
>
> Background:
>
> PGO: LLVM already has support for branch weight metadata and can use that
> to compute block frequencies and generate better code.  Those branch
> weights can come from __builtin_expect or from real profile data.  We
> currently have some support for IR-level instrumentation to insert CFG edge
> counts and then read those counts back into the IR as branch weights.  That
> support is not very well integrated into clang and does not work well if
> the IR changes at all after the profile data is collected.  Diego Novillo
> has recently expressed interested in a sampling-based approach to
> collecting profile data for PGO.  I believe that would be complementary to
> this proposal to use instrumentation.  There are pros and cons of either
> approach, so it would be great to offer a choice between them.  Both
> approaches should share the same metadata representation.
>
> Coverage: We currently support code coverage testing with gcov.  We also
> have a very basic llvm-cov tool that is intended to work much like gcov,
> and it currently works from the same gcov data files.  At least with the
> version of gcov that I have access to, it does not provide a great
> experience, especially with regard to showing accurate source ranges.  For
> expressions spread across multiple source lines, typically only one of
> those lines is reported as having any execution counts at all.  The same
> problem exists within one source line that has multiple basic blocks --
> gcov can show that there are multiple blocks but it doesn't show the source
> range for each block.
>
>
> Proposal:
>
> I would like to add instrumentation code in clang to collect data for both
> PGO and coverage.  Since the same data can be used for both, it makes sense
> to combine them, especially if you can then use a coverage tool to examine
> your profile data that is being used for PGO. The same code that inserts
> the instrumentation will be used (with a different build setting) to add
> the branch weight metadata to the IR.  Here are my main design goals:
>
> * PGO should degrade gracefully when the source changes: I started out
> trying to get around this and just require the profile data to be
> regenerated for every source change, but that really didn't work out.  For
> a variety of reasons, it is important to avoid that. I'm not expecting much
> disagreement so I won't go into details. The proposal here is to match the
> profile data to the current code on a function-by-function basis.  If the
> profile data is stale for a particular function, it will be ignored. More
> sophisticated solutions could be considered in the future, but as far as I
> know, that is still an open research problem.
>
> * Profile data must not be invalidated by compiler changes: No one should
> have to regenerate their profile data just because they installed a new
> compiler. This is especially important to Apple, but I expect it applies to
> most people to some extent. Just as an example, while working on this I
> stumbled across some work-in-progress to support switch case-ranges in LLVM
> IR. Clang's current code-gen for case-ranges expands them out to either a
> series of separate cases or conditional branches in the default block,
> depending on the size of the range. I want profile data collected with the
> current code-gen to remain valid if we ever get native case-range support,
> even though the IR-level CFG might look significantly different. The
> proposed solution here is to associate the instrumentation counters with
> clang AST nodes, which are much less likely to change in a way that would
> invalidate the profile data.
>
> * Provide accurate coverage data: As described above, I think it will be
> quite useful to use the same instrumentation for both PGO and coverage, and
> the coverage data we collect should be as accurate as possible. Clang's
> IR-gen does a few optimizations, e.g., to remove dead code, that require
> some care to make sure the coverage data is correct.
>
> * Minimize runtime overhead for instrumentation: If the instrumented code
> is very slow or requires far more memory than normal, that could be a
> barrier to using it. Since the proposal here is to insert the
> instrumentation in the front-end, we can take advantage of the structure of
> the ASTs to reduce the number of counters. (The effect should be similar to
> what you get by using a spanning tree to place counters in an arbitrary
> CFG.) We can also compile the instrumented code with full optimization.
>
>
> Summary of the Instrumentation Phase:
>
> In the attached patch, I added a "-finstrument-regions" option for this.
>  Other name suggestions are welcome.  With that option, clang's IR-gen for
> a function first walks through the AST in source order, assigning
> sequential counter indices for the nodes involved in conditional control
> flow.  While generating the IR, for each AST node that has associated
> counters, it looks up the counter indices and emits code to increment those
> counters at the appropriate places.  I have tried to minimize the number of
> counters.  For an if-statement, I only put a counter on the then-block.  If
> you know the count going into the if-statement, you can infer the count for
> the else-block and the fall-through block.  For a while loop, I use 3
> counters: one for the condition expression, one for the loop body, and one
> for the exit block. Gotos are handled by putting counters on the labels.
> Switches are more interesting: to get the right data for PGO I had to put
> the counters on the CFG edges from the switch to each case, i.e., not
> counting the fall through. For code coverage, and also for keeping track of
> the current count during PGO compilation, the fall through counts need to
> be added to the edge counts.  I've prototyped all of that and it seems to
> work reasonably well.  I haven't added instrumentation for all the
> control-flow constructs yet, since I was just trying to convince myself
> that this approach works, but the rest should be similar.
>
>
> Summary of PGO Compilation:
>
> I added a "-fpgo" option for this.  Again, name suggestions are welcome.
>  The current prototype patch just uses a fixed filename for the profile
> data, but that should be changed to be an argument to the -fpgo option.
>  The profile data is read in when the CodeGenModule is initialized.  The
> data file is currently plain text but should be changed to something more
> efficient. The data for each function is identified by the mangled function
> name.  For static functions, we'll need to add a file name to distinguish
> them uniquely. There is a single data file for the entire executable, so
> the compilation of a single source file may only use a fraction of the
> entries. For each function during IR-gen, the AST is traversed in the same
> way as during the instrumentation phase.  If the number of counters does
> not match what was in the profile data, then the source must have changed
> and the profile data is ignored.  We probably need to make that check more
> specific, e.g., by computing some sort of hash of the AST contents (in a
> way that doesn't depend on internal details of the AST representation).  If
> the profile data doesn't match, we can add a warning diagnostic, and if
> that is too noisy in practice, we can investigate things like warning only
> when a certain proportion of the functions have stale data, or warning only
> for hot functions, etc. Assuming the profile data matches, during IR-gen
> the proposed approach is to keep track of the current execution count,
> updating it from the counter values when visiting code that was
> instrumented. Those counts are used to add the branch weight metadata to
> the generated IR. There are a few cases where the ASTs don't distinguish
> all the edges, e.g., a case-range that gets expanded to a bunch of separate
> cases. This is not a problem for code coverage, because we can only report
> the counts for things that exist in the source code.  For PGO, these
> situations are not common, and it should be good enough to just divide the
> counts evenly across the possibilities, e.g., for the case-range all the
> case values are going to jump to the same code anyway, so there's no real
> advantage is distinguishing the branch weight for each value in the range.
>
>
> Summary of Code Coverage:
>
> I had implemented this with an earlier approach but do not currently have
> anything working. The ASTs have very accurate source locations, so it is no
> problem to accurately report the execution counts. One option here would be
> to have a code coverage system actually run the compiler again to correlate
> the profile data with the ASTs in the same way that it does when compiling
> with -fpgo.  That would have the advantage of letting you see coverage
> counts from a profile data file after changing the source slightly.
>  Actually given the desire to use a coverage tool to view the data for PGO,
> I'm strongly inclined to go that way.  Alternatively, the compiler during
> the instrumentation phase could write out a description of how to calculate
> the counts for each source range.  That would take some extra work and the
> results would be tied to the source locations at the time the
> instrumentation was done, but it would avoid the need to re-run the
> compiler to view the coverage.
>
> Here are the proof-of-concept patches for clang and compiler-rt.  I intend
> to clean them up and add more comments, but hopefully they are useful for
> anyone who wants a more detailed understanding of the proposal.
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130906/a6b10361/attachment.html>