[cfe-dev] Proposal: add instrumentation for PGO and code coverage

Tue Sep 10 15:00:39 PDT 2013

On Sep 9, 2013, at 1:53 PM, "Katzfey, Eric" <ekatzfey at qti.qualcomm.com> wrote:

> 
> 
>> -----Original Message-----
>> From: cfe-dev-bounces at cs.uiuc.edu [mailto:cfe-dev-bounces at cs.uiuc.edu]
>> On Behalf Of Diego Novillo
>> Sent: Saturday, September 07, 2013 6:55 AM
>> To: Bob Wilson
>> Cc: clang-dev Developers
>> Subject: Re: [cfe-dev] Proposal: add instrumentation for PGO and code
>> coverage
>> 
>> 
>> In terms of the metadata representation, what are your thoughts on the on-
>> disk format to use? Since we want to support multiple external profile
>> sources, I would like to define a canonical on-disk representation that every
>> profiling source should convert to.
>> 
>> This way, the profile loader in the compiler needs to handle that single file
>> format.
>> 
> [Eric] Yes, I am thinking in terms of embedded targets where instrumentation cannot work. I would like to be able to take trace data generated by the program running on target and then pull out the relevant profile data on branch counts to feed to PGO. How do I create the profile format such that the branch counts taken from the code addresses of the executable match up to the branches being optimized by the compiler?

I'm assuming that we will provide some API that you can use to replace the standard output routines to write the profile data.  For an embedded target, you could just dump the data into a memory buffer or perhaps send it over a wire back to a host machine.

> 
> Also, to the point of only having one data file per executable, it seems like it would be better to associate a data file per source file. That way I can more easily store my profile data along with my source file and it will get used individually when that source file gets pulled into different builds.

As Eric already commented, that doesn't work very well.  Besides what he wrote, it's really important to allow collecting separate profiles from different machines, different inputs, etc.  If you have 50 different scenarios that you want to include in your profiling, you really don't want 50 separate data files for every source file, especially if they'll stored alongside the source in the same directories.