[cfe-dev] Proposal: add instrumentation for PGO and code coverage

Fri Sep 6 17:36:02 PDT 2013

On Sep 6, 2013, at 5:16 PM, Quentin Colombet <qcolombet at apple.com> wrote:

> Hi Bob,
> 
> Two general comments (detailed inline with your proposal):
> - pgo should be able to deal with several profile data.
> - we should do something about counters set to zero.
> 

…

> 
> Sorry if I missed it but I think it is important that we could pass several profile data to the pgo option (or have a tool that aggregate several files into one).
> The use case is:
> - Gather profiling data for an application with different inputs, each generate a profile.
> - Recompile this application using one aggregated profile.

Yes, definitely.  I am planning to have a separate tool to aggregate profile data files, but if that doesn't work out, we would want the compiler to do the aggregation.  One way or the other, we should definitely allow multiple profiles to be combined.

> 
> For optimizations purposes, another consideration is we may want to bias a bit the numbers we collect to avoid weird heuristic behavior.
> The idea is to adjust the numbers for counters that are zeros.
> Let me illustrate with an example.
> Consider the following code:
> 
> if (something that should -almost- never occur) {
>  do specific job
> } else {
>  do regular job
> }
> 
> It is likely that the then block will have its counter set to zero.
> Therefore, a frequency based heuristic may take weird decision in that block because it may think it is free to put code here.
> 
> Generally speaking, this may be okay w.r.t. performance since this path is -almost- never supposed to be taken, but it may increase the code size and may have other side effect on the cache for instance. We may also do not want to degrade the performance in the almost never taken path.
> 
> My point is, we should not end up with basic block with 0 frequency or branch with 0 probability, otherwise we expose ourselves to weird behavior.
> We could argue that each optimization has to deal with zero frequency but I guess it will complicate the overall code base of LLVM.

Yes.  Jakob already pointed me at LaPlace's Rule of Succession, which basically says the same thing: http://en.wikipedia.org/wiki/Rule_of_succession

The patch already has code in it to use Count + 1 for the branch weights in the metadata.

> 
> Thanks,
> 
> -Quentin
> 
>> The profile data is read in when the CodeGenModule is initialized.
>> The data file is currently plain text but should be changed to something more efficient. The data for each function is identified by the mangled function name.  For static functions, we'll need to add a file name to distinguish them uniquely. There is a single data file for the entire executable, so the compilation of a single source file may only use a fraction of the entries. For each function during IR-gen, the AST is traversed in the same way as during the instrumentation phase.  If the number of counters does not match what was in the profile data, then the source must have changed and the profile data is ignored.  We probably need to make that check more specific, e.g., by computing some sort of hash of the AST contents (in a way that doesn't depend on internal details of the AST representation).  If the profile data doesn't match, we can add a warning diagnostic, and if that is too noisy in practice, we can investigate things like warning only when a certain proportion of the functions have stale data, or warning only for hot functions, etc. Assuming the profile data matches, during IR-gen the proposed approach is to keep track of the current execution count, updating it from the counter values when visiting code that was instrumented. Those counts are used to add the branch weight metadata to the generated IR. There are a few cases where the ASTs don't distinguish all the edges, e.g., a case-range that gets expanded to a bunch of separate cases. This is not a problem for code coverage, because we can only report the counts for things that exist in the source code.  For PGO, these situations are not common, and it should be good enough to just divide the counts evenly across the possibilities, e.g., for the case-range all the case values are going to jump to the same code anyway, so there's no real advantage is distinguishing the branch weight for each value in the range.
>> 
>> 
>> Summary of Code Coverage:
>> 
>> I had implemented this with an earlier approach but do not currently have anything working. The ASTs have very accurate source locations, so it is no problem to accurately report the execution counts. One option here would be to have a code coverage system actually run the compiler again to correlate the profile data with the ASTs in the same way that it does when compiling with -fpgo.  That would have the advantage of letting you see coverage counts from a profile data file after changing the source slightly.  Actually given the desire to use a coverage tool to view the data for PGO, I'm strongly inclined to go that way.  Alternatively, the compiler during the instrumentation phase could write out a description of how to calculate the counts for each source range.  That would take some extra work and the results would be tied to the source locations at the time the instrumentation was done, but it would avoid the need to re-run the compiler to view the coverage.
>> 
>> Here are the proof-of-concept patches for clang and compiler-rt.  I intend to clean them up and add more comments, but hopefully they are useful for anyone who wants a more detailed understanding of the proposal.
>> 
>> 
>> <pgo-clang.patch><pgo-compiler-rt.patch>_______________________________________________
>> cfe-dev mailing list
>> cfe-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>