[cfe-dev] Proposal: add instrumentation for PGO and code coverage
Bob Wilson
bob.wilson at apple.com
Fri Sep 6 22:02:35 PDT 2013
On Sep 6, 2013, at 7:03 PM, Sean Silva <silvas at purdue.edu> wrote:
> I wonder if there's a way to take advantage of Clang/LLVM's library-based design to achieve this functionality in a less invasive way than "directly code in all the necessary logic into various relevant parts of clang's IRGen".
>
> Off the top of my head, how far could you get with an LLVM pass that consumes debug info? That would keep the logic nice and separated.
It keeps the logic nicely separated but I don’t see any way to achieve my design goals with that approach.
>
> Failing a solution with existing means, maybe it's time for clang to grow some "pass"-like functionality, so that every new instrumentation that needs AST information can coexist in peace without incrementally adding complexity to IRGen. I think a lot of ground could be covered by a two-part scheme where a "clang pass" can look at the AST and tell IRGen to attach annotations to the code generated from particular AST nodes, coupled with a corresponding IR pass that interprets those annotations and transforms the IR.
I will think about it some more, but I don’t think that works well. Believe me, I’ve tried really hard to make this less invasive, but most of the things I have tried have not worked out (and I’ve already adopted the things that did work out). Fundamentally, we need code-gen to insert the instrumentation and also to add the branch weight metadata, so much of the code needs to be in code-gen regardless.
I’m essentially already doing a separate “pass” over the AST to assign the counter indices. Eventually I expect to have separate code outside code-gen to calculate code coverage statistics. I could possibly reuse that as a separate “pass” before code-gen, but that wouldn’t actually do much to simplify the changes to code-gen.
> Random thing I saw when looking at the patches:
>
> +typedef unsigned int uint32_t;
> +typedef unsigned int uint64_t;
>
> This seems super sketchy.
That is exactly the sort of thing that prompted me to put a disclaimer on this that it is a proof-of-concept patch and not something that I intend to submit in its current state.
>
> -- Sean Silva
>
>
> On Fri, Sep 6, 2013 at 6:32 PM, Bob Wilson <bob.wilson at apple.com> wrote:
> I've been investigating better support for both profile-guided optimization (PGO) and code coverage in Clang/LLVM. I've spent some time experimenting with several options and have arrived at this proposal. I'd like to get feedback on the overall approach before I start submitting patches. There is a proof-of-concept patch attached here so you can see the details, but it needs more testing and cleanup before it is ready to commit.
>
>
> Background:
>
> PGO: LLVM already has support for branch weight metadata and can use that to compute block frequencies and generate better code. Those branch weights can come from __builtin_expect or from real profile data. We currently have some support for IR-level instrumentation to insert CFG edge counts and then read those counts back into the IR as branch weights. That support is not very well integrated into clang and does not work well if the IR changes at all after the profile data is collected. Diego Novillo has recently expressed interested in a sampling-based approach to collecting profile data for PGO. I believe that would be complementary to this proposal to use instrumentation. There are pros and cons of either approach, so it would be great to offer a choice between them. Both approaches should share the same metadata representation.
>
> Coverage: We currently support code coverage testing with gcov. We also have a very basic llvm-cov tool that is intended to work much like gcov, and it currently works from the same gcov data files. At least with the version of gcov that I have access to, it does not provide a great experience, especially with regard to showing accurate source ranges. For expressions spread across multiple source lines, typically only one of those lines is reported as having any execution counts at all. The same problem exists within one source line that has multiple basic blocks -- gcov can show that there are multiple blocks but it doesn't show the source range for each block.
>
>
> Proposal:
>
> I would like to add instrumentation code in clang to collect data for both PGO and coverage. Since the same data can be used for both, it makes sense to combine them, especially if you can then use a coverage tool to examine your profile data that is being used for PGO. The same code that inserts the instrumentation will be used (with a different build setting) to add the branch weight metadata to the IR. Here are my main design goals:
>
> * PGO should degrade gracefully when the source changes: I started out trying to get around this and just require the profile data to be regenerated for every source change, but that really didn't work out. For a variety of reasons, it is important to avoid that. I'm not expecting much disagreement so I won't go into details. The proposal here is to match the profile data to the current code on a function-by-function basis. If the profile data is stale for a particular function, it will be ignored. More sophisticated solutions could be considered in the future, but as far as I know, that is still an open research problem.
>
> * Profile data must not be invalidated by compiler changes: No one should have to regenerate their profile data just because they installed a new compiler. This is especially important to Apple, but I expect it applies to most people to some extent. Just as an example, while working on this I stumbled across some work-in-progress to support switch case-ranges in LLVM IR. Clang's current code-gen for case-ranges expands them out to either a series of separate cases or conditional branches in the default block, depending on the size of the range. I want profile data collected with the current code-gen to remain valid if we ever get native case-range support, even though the IR-level CFG might look significantly different. The proposed solution here is to associate the instrumentation counters with clang AST nodes, which are much less likely to change in a way that would invalidate the profile data.
>
> * Provide accurate coverage data: As described above, I think it will be quite useful to use the same instrumentation for both PGO and coverage, and the coverage data we collect should be as accurate as possible. Clang's IR-gen does a few optimizations, e.g., to remove dead code, that require some care to make sure the coverage data is correct.
>
> * Minimize runtime overhead for instrumentation: If the instrumented code is very slow or requires far more memory than normal, that could be a barrier to using it. Since the proposal here is to insert the instrumentation in the front-end, we can take advantage of the structure of the ASTs to reduce the number of counters. (The effect should be similar to what you get by using a spanning tree to place counters in an arbitrary CFG.) We can also compile the instrumented code with full optimization.
>
>
> Summary of the Instrumentation Phase:
>
> In the attached patch, I added a "-finstrument-regions" option for this. Other name suggestions are welcome. With that option, clang's IR-gen for a function first walks through the AST in source order, assigning sequential counter indices for the nodes involved in conditional control flow. While generating the IR, for each AST node that has associated counters, it looks up the counter indices and emits code to increment those counters at the appropriate places. I have tried to minimize the number of counters. For an if-statement, I only put a counter on the then-block. If you know the count going into the if-statement, you can infer the count for the else-block and the fall-through block. For a while loop, I use 3 counters: one for the condition expression, one for the loop body, and one for the exit block. Gotos are handled by putting counters on the labels. Switches are more interesting: to get the right data for PGO I had to put the counters on the CFG edges from the switch to each case, i.e., not counting the fall through. For code coverage, and also for keeping track of the current count during PGO compilation, the fall through counts need to be added to the edge counts. I've prototyped all of that and it seems to work reasonably well. I haven't added instrumentation for all the control-flow constructs yet, since I was just trying to convince myself that this approach works, but the rest should be similar.
>
>
> Summary of PGO Compilation:
>
> I added a "-fpgo" option for this. Again, name suggestions are welcome. The current prototype patch just uses a fixed filename for the profile data, but that should be changed to be an argument to the -fpgo option. The profile data is read in when the CodeGenModule is initialized. The data file is currently plain text but should be changed to something more efficient. The data for each function is identified by the mangled function name. For static functions, we'll need to add a file name to distinguish them uniquely. There is a single data file for the entire executable, so the compilation of a single source file may only use a fraction of the entries. For each function during IR-gen, the AST is traversed in the same way as during the instrumentation phase. If the number of counters does not match what was in the profile data, then the source must have changed and the profile data is ignored. We probably need to make that check more specific, e.g., by computing some sort of hash of the AST contents (in a way that doesn't depend on internal details of the AST representation). If the profile data doesn't match, we can add a warning diagnostic, and if that is too noisy in practice, we can investigate things like warning only when a certain proportion of the functions have stale data, or warning only for hot functions, etc. Assuming the profile data matches, during IR-gen the proposed approach is to keep track of the current execution count, updating it from the counter values when visiting code that was instrumented. Those counts are used to add the branch weight metadata to the generated IR. There are a few cases where the ASTs don't distinguish all the edges, e.g., a case-range that gets expanded to a bunch of separate cases. This is not a problem for code coverage, because we can only report the counts for things that exist in the source code. For PGO, these situations are not common, and it should be good enough to just divide the counts evenly across the possibilities, e.g., for the case-range all the case values are going to jump to the same code anyway, so there's no real advantage is distinguishing the branch weight for each value in the range.
>
>
> Summary of Code Coverage:
>
> I had implemented this with an earlier approach but do not currently have anything working. The ASTs have very accurate source locations, so it is no problem to accurately report the execution counts. One option here would be to have a code coverage system actually run the compiler again to correlate the profile data with the ASTs in the same way that it does when compiling with -fpgo. That would have the advantage of letting you see coverage counts from a profile data file after changing the source slightly. Actually given the desire to use a coverage tool to view the data for PGO, I'm strongly inclined to go that way. Alternatively, the compiler during the instrumentation phase could write out a description of how to calculate the counts for each source range. That would take some extra work and the results would be tied to the source locations at the time the instrumentation was done, but it would avoid the need to re-run the compiler to view the coverage.
>
> Here are the proof-of-concept patches for clang and compiler-rt. I intend to clean them up and add more comments, but hopefully they are useful for anyone who wants a more detailed understanding of the proposal.
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130906/a77529e0/attachment.html>
More information about the cfe-dev
mailing list