[cfe-dev] Proposal: add instrumentation for PGO and code coverage

Tue Sep 10 14:53:46 PDT 2013

On Sep 7, 2013, at 6:54 AM, Diego Novillo <dnovillo at google.com> wrote:

> On 2013-09-06 18:32 , Bob Wilson wrote:
>> Background:
>> 
>> PGO: LLVM already has support for branch weight metadata and can use that to compute block frequencies and generate better code.  Those branch weights can come from __builtin_expect or from real profile data.  We currently have some support for IR-level instrumentation to insert CFG edge counts and then read those counts back into the IR as branch weights.  That support is not very well integrated into clang and does not work well if the IR changes at all after the profile data is collected.  Diego Novillo has recently expressed interested in a sampling-based approach to collecting profile data for PGO.  I believe that would be complementary to this proposal to use instrumentation.  There are pros and cons of either approach, so it would be great to offer a choice between them.  Both approaches should share the same metadata representation.
> 
> I agree, the auto profiler can and should co-exist with the instrumentation profiler.  In a sense, the auto profiler is just a lossy version of the instrumentation profiler.
> 
> One thing I'm not so sure about your approach is the decision to do PGO instrumentation at the source level. I would suspect that this may produce more instrumentation than needed (which is then hard to remove in the optimizers) and it also prevents other LLVM front ends from being able to use PGO.

The scheme I'm proposing here doesn't help other front-ends to provide PGO but it doesn't prevent them, either.  IMO, the important point to standardize across various profiling schemes is the metadata format.

> 
> Wouldn't it make it easier if the PGO instrumentation was generated by LLVM itself?

I suppose that depends on how you define "easier".  I don't think it is possible to meet my design goals with that approach.

> 
> For code coverage, I think being closer to the source helps. For PGO, I'm not so sure. But I also don't know for certain that my suspicion of excessive instrumentation is correct. The only thing I'm sure of, is that other front ends will not be able to take advantage of it.

Yes, that's a tradeoff.  All other things being equal I would have liked to do something that applies to all front ends, but not at the expense of providing a great experience with clang.

I'm intentionally pairing coverage and PGO here.  We don't want to require rebuilding with different instrumentation for each of those.  We also want to make it easy for developers to examine their PGO profile data using code coverage tools.

> 
> In terms of the metadata representation, what are your thoughts on the on-disk format to use? Since we want to support multiple external profile sources, I would like to define a canonical on-disk representation that every profiling source should convert to.
> 
> This way, the profile loader in the compiler needs to handle that single file format.

This scheme would not work with a profile loader like that.  The profile data in this proposal is tied to clang's ASTs.  It must be loaded directly by clang and then translated to metadata in the IR.

> 
>> * PGO should degrade gracefully when the source changes: I started out trying to get around this and just require the profile data to be regenerated for every source change, but that really didn't work out.  For a variety of reasons, it is important to avoid that. I'm not expecting much disagreement so I won't go into details. The proposal here is to match the profile data to the current code on a function-by-function basis.  If the profile data is stale for a particular function, it will be ignored. More sophisticated solutions could be considered in the future, but as far as I know, that is still an open research problem.
> 
> Absolutely. My current plan wrt auto profiling is to drop on the floor any samples for which I can make no sense of. Perhaps offering a diagnostic option, but certainly giving up and falling back on static heuristics.
>> 
>> * Profile data must not be invalidated by compiler changes: No one should have to regenerate their profile data just because they installed a new compiler. This is especially important to Apple, but I expect it applies to most people to some extent. Just as an example, while working on this I stumbled across some work-in-progress to support switch case-ranges in LLVM IR. Clang's current code-gen for case-ranges expands them out to either a series of separate cases or conditional branches in the default block, depending on the size of the range. I want profile data collected with the current code-gen to remain valid if we ever get native case-range support, even though the IR-level CFG might look significantly different. The proposed solution here is to associate the instrumentation counters with clang AST nodes, which are much less likely to change in a way that would invalidate the profile data.
> 
> But ultimately, this is a special case of code drifting, is it not? Or maybe I just completely missed your point here.

I think you missed my point.  There are 2 kinds of drift:

1) The source code changes.  This is inevitable but manageable because you can tell the developer to regenerate their profile data whenever they change the code.  Or not.  They can choose.

2) The compiler changes.  See, for example, the case-range issue that I mentioned above, where the compiler may suddenly start producing IR with a drastically different CFG even when the source code did not change at all.  This is not manageable, at least for us.  If the new compiler determines that the old profile data doesn't apply, it should warn about that, which could break the build if the project uses -Werror, and even if it doesn't, there may still be a performance regression.  We need to regularly release new compilers internally and we have no control over other people's profile data files.  We need a solution that avoids the possibility of either broken builds or performance regressions when we release a new compiler.

>> 
>> * Provide accurate coverage data: As described above, I think it will be quite useful to use the same instrumentation for both PGO and coverage, and the coverage data we collect should be as accurate as possible. Clang's IR-gen does a few optimizations, e.g., to remove dead code, that require some care to make sure the coverage data is correct.
> 
> And this is the case I think why coverage instrumentation at the FE level makes a lot of sense.

I'm intentionally pairing coverage and PGO here.  We don't want to require rebuilding with different instrumentation for each of those.  We also want to make it easy for developers to examine their PGO profile data using code coverage tools.

But, there's no reason you would need to pair those if it didn't suit your needs.  You could use this proposed feature for code coverage and still use the sampling-based profile data for PGO.

> 
>> 
>> * Minimize runtime overhead for instrumentation: If the instrumented code is very slow or requires far more memory than normal, that could be a barrier to using it. Since the proposal here is to insert the instrumentation in the front-end, we can take advantage of the structure of the ASTs to reduce the number of counters. (The effect should be similar to what you get by using a spanning tree to place counters in an arbitrary CFG.) We can also compile the instrumented code with full optimization.
> 
> This is the part that I have doubts about.  Perhaps I'm too used to generating instrumentation from the compiler's CFG.

I don't see why front-end instrumentation would have more overhead than instrumentation added at the IR level.  Most the usual optimizations that you would do to minimize the counters just fall out naturally by following the structured control flow of the ASTs to determine where to place counters.

I'll work on getting some numbers soon.

> 
>> 
>> 
>> Summary of the Instrumentation Phase:
>> 
>> In the attached patch, I added a "-finstrument-regions" option for this.  Other name suggestions are
> 
> -fprofile-generate and -fprofile-use?  I'm drawing from my GCC background here.  Sorry  ;)

Yeah, I like those names, too.  I was trying to avoid confusion with GCC's implementation.  This proposal will not be compatible with what GCC does, so I'm not sure that using the same names would be a good idea.

> 
> 
>> If the profile data doesn't match, we can add a warning diagnostic, and if that is too noisy in practice, we can investigate things like warning only when a certain proportion of the functions have
> 
> Right. Here I was initially thinking a diagnostic when the compiler drops more than N% of the records from the instrumentation file.
> 
>> stale data, or warning only for hot functions, etc. Assuming the profile data matches, during IR-gen the proposed approach is to keep track of the current execution count, updating it from the counter values when visiting code that was instrumented. Those counts are used to add the branch weight metadata to the generated IR. There are a few cases where the ASTs don't distinguish all the edges, e.g., a case-range that gets expanded to a bunch of separate cases. This is not a problem for code coverage, because we can only report the counts for things that exist in the source code.  For PGO, these situations are not common, and it should be good enough to just divide the counts evenly across t
>> he possibilities, e.g., for the case-range all the case values are going to jump to the same code anyway, so there's no real advantage is distinguishing the branch weight for each value in the range.
> 
> For execution-count profiles, this should be a good start.  Other types of profiling, like data-based ones, may need other attributes (perhaps in types, or variables). I'm currently not sure how to handle those.
> 
> 
> Diego.
>