[cfe-dev] Proposal: add instrumentation for PGO and code coverage

Sat Sep 7 06:54:30 PDT 2013

On 2013-09-06 18:32 , Bob Wilson wrote:
> Background:
>
> PGO: LLVM already has support for branch weight metadata and can use that to compute block frequencies and generate better code.  Those branch weights can come from __builtin_expect or from real profile data.  We currently have some support for IR-level instrumentation to insert CFG edge counts and then read those counts back into the IR as branch weights.  That support is not very well integrated into clang and does not work well if the IR changes at all after the profile data is collected.  Diego Novillo has recently expressed interested in a sampling-based approach to collecting profile data for PGO.  I believe that would be complementary to this proposal to use instrumentation.  There are pros and cons of either approach, so it would be great to offer a choice between them.  Both approaches should share the same metadata representation.

I agree, the auto profiler can and should co-exist with the 
instrumentation profiler.  In a sense, the auto profiler is just a lossy 
version of the instrumentation profiler.

One thing I'm not so sure about your approach is the decision to do PGO 
instrumentation at the source level. I would suspect that this may 
produce more instrumentation than needed (which is then hard to remove 
in the optimizers) and it also prevents other LLVM front ends from being 
able to use PGO.

Wouldn't it make it easier if the PGO instrumentation was generated by 
LLVM itself?

For code coverage, I think being closer to the source helps. For PGO, 
I'm not so sure. But I also don't know for certain that my suspicion of 
excessive instrumentation is correct. The only thing I'm sure of, is 
that other front ends will not be able to take advantage of it.

In terms of the metadata representation, what are your thoughts on the 
on-disk format to use? Since we want to support multiple external 
profile sources, I would like to define a canonical on-disk 
representation that every profiling source should convert to.

This way, the profile loader in the compiler needs to handle that single 
file format.

> * PGO should degrade gracefully when the source changes: I started out trying to get around this and just require the profile data to be regenerated for every source change, but that really didn't work out.  For a variety of reasons, it is important to avoid that. I'm not expecting much disagreement so I won't go into details. The proposal here is to match the profile data to the current code on a function-by-function basis.  If the profile data is stale for a particular function, it will be ignored. More sophisticated solutions could be considered in the future, but as far as I know, that is still an open research problem.

Absolutely. My current plan wrt auto profiling is to drop on the floor 
any samples for which I can make no sense of. Perhaps offering a 
diagnostic option, but certainly giving up and falling back on static 
heuristics.
>
> * Profile data must not be invalidated by compiler changes: No one should have to regenerate their profile data just because they installed a new compiler. This is especially important to Apple, but I expect it applies to most people to some extent. Just as an example, while working on this I stumbled across some work-in-progress to support switch case-ranges in LLVM IR. Clang's current code-gen for case-ranges expands them out to either a series of separate cases or conditional branches in the default block, depending on the size of the range. I want profile data collected with the current code-gen to remain valid if we ever get native case-range support, even though the IR-level CFG might look significantly different. The proposed solution here is to associate the instrumentation counters with clang AST nodes, which are much less likely to change in a way that would invalidate the profile data.

But ultimately, this is a special case of code drifting, is it not? Or 
maybe I just completely missed your point here.

>
> * Provide accurate coverage data: As described above, I think it will be quite useful to use the same instrumentation for both PGO and coverage, and the coverage data we collect should be as accurate as possible. Clang's IR-gen does a few optimizations, e.g., to remove dead code, that require some care to make sure the coverage data is correct.

And this is the case I think why coverage instrumentation at the FE 
level makes a lot of sense.

>
> * Minimize runtime overhead for instrumentation: If the instrumented code is very slow or requires far more memory than normal, that could be a barrier to using it. Since the proposal here is to insert the instrumentation in the front-end, we can take advantage of the structure of the ASTs to reduce the number of counters. (The effect should be similar to what you get by using a spanning tree to place counters in an arbitrary CFG.) We can also compile the instrumented code with full optimization.

This is the part that I have doubts about.  Perhaps I'm too used to 
generating instrumentation from the compiler's CFG.

>
>
> Summary of the Instrumentation Phase:
>
> In the attached patch, I added a "-finstrument-regions" option for this.  Other name suggestions are

-fprofile-generate and -fprofile-use?  I'm drawing from my GCC 
background here.  Sorry  ;)

> If the profile data doesn't match, we can add a warning diagnostic, and if that is too noisy in practice, we can investigate things like warning only when a certain proportion of the functions have

Right. Here I was initially thinking a diagnostic when the compiler 
drops more than N% of the records from the instrumentation file.

> stale data, or warning only for hot functions, etc. Assuming the profile data matches, during IR-gen the proposed approach is to keep track of the current execution count, updating it from the counter values when visiting code that was instrumented. Those counts are used to add the branch weight metadata to the generated IR. There are a few cases where the ASTs don't distinguish all the edges, e.g., a case-range that gets expanded to a bunch of separate cases. This is not a problem for code coverage, because we can only report the counts for things that exist in the source code.  For PGO, these situations are not common, and it should be good enough to just divide the counts evenly across t
> he possibilities, e.g., for the case-range all the case values are going to jump to the same code anyway, so there's no real advantage is distinguishing the branch weight for each value in the range.

For execution-count profiles, this should be a good start.  Other types 
of profiling, like data-based ones, may need other attributes (perhaps 
in types, or variables). I'm currently not sure how to handle those.

Diego.