[LLVMdev] RFC: Binary format for instrumentation based profiling data

Tue Mar 25 11:05:38 PDT 2014

On Mar 25, 2014, at 10:46 AM, Robinson, Paul <Paul_Robinson at playstation.sony.com> wrote:

>> On Mar 24, 2014, at 10:08 AM, Robinson, Paul
>> <Paul_Robinson at playstation.sony.com> wrote:
>> 
>>>> We seem to have some agreement that two formats for instrumentation
>>>> based profiling is worthwhile. These are that emitted by compiler-rt
>> in
>>>> the instrumented program at runtime (format 1), and that which is
>>>> consumed by clang when compiling the program with PGO (format 2).
>>>> 
>>>> Format 1
>>>> --------
>>>> 
>>>> This format should be efficient to write, since the instrumented
>> program
>>>> should run with as little overhead as possible. This also doesn't
>> need
>>>> to be stable, and we can assume the same version of LLVM that was
>> used
>>>> to instrument the program will read the counter data. As such, the
>> file
>>>> format is versioned (so we can easily reject versions we don't
>>>> understand) and consists basically of a memory dump of the relevant
>>>> profiling counters.
>>> 
>>> The "same version" assertion isn't completely true, at a previous job
>>> we had clients who preferred not to regenerate profile data unless
>> they
>>> actually had to (because it was a big pain and took a long time).  But
>>> as long as the versioning is based on actual format changes, not just
>>> repurposing the current LLVM version number (making the previous data
>>> unusable for no technical reason), that's okay.
>> 
>> Format 1 (extension .profraw since r204676) should be run immediately
>> through llvm-profdata to generate format 2 (extension .profdata).  The
>> only profiles that should be kept around are format 2.
> 
> Okay, but the version comment still applies to format 2 then.

Right.  Format 2 needs to be auto-upgraded if we have version changes.

> 
>> 
>>> As long as I'm bothering to say something, is there some way that the
>>> tools will figure out that you're trying to apply old data to new
>> files
>>> that have changed in ways that make the old data inapplicable?  Sorry
>>> if this has been brought up elsewhere and I just missed it.
>>> -paulr
>> 
>> There's a hash for each function based on the layout of the counters
>> assigned to it.  If the hash from the data doesn't match the current
>> frontend, the data is ignored.  Currently, the hash is extremely naive:
>> the number of counters.
> 
> Eww.  Should be CFG-based.  I think merge-similar-functions has a way
> to compute this?

Working on it in "[PATCH] InstrProf: Calculate a better function hash" :).