[LLVMdev] RFC: Binary format for instrumentation based profiling data
Duncan P. N. Exon Smith
dexonsmith at apple.com
Tue Mar 25 11:05:38 PDT 2014
On Mar 25, 2014, at 10:46 AM, Robinson, Paul <Paul_Robinson at playstation.sony.com> wrote:
>> On Mar 24, 2014, at 10:08 AM, Robinson, Paul
>> <Paul_Robinson at playstation.sony.com> wrote:
>>
>>>> We seem to have some agreement that two formats for instrumentation
>>>> based profiling is worthwhile. These are that emitted by compiler-rt
>> in
>>>> the instrumented program at runtime (format 1), and that which is
>>>> consumed by clang when compiling the program with PGO (format 2).
>>>>
>>>> Format 1
>>>> --------
>>>>
>>>> This format should be efficient to write, since the instrumented
>> program
>>>> should run with as little overhead as possible. This also doesn't
>> need
>>>> to be stable, and we can assume the same version of LLVM that was
>> used
>>>> to instrument the program will read the counter data. As such, the
>> file
>>>> format is versioned (so we can easily reject versions we don't
>>>> understand) and consists basically of a memory dump of the relevant
>>>> profiling counters.
>>>
>>> The "same version" assertion isn't completely true, at a previous job
>>> we had clients who preferred not to regenerate profile data unless
>> they
>>> actually had to (because it was a big pain and took a long time). But
>>> as long as the versioning is based on actual format changes, not just
>>> repurposing the current LLVM version number (making the previous data
>>> unusable for no technical reason), that's okay.
>>
>> Format 1 (extension .profraw since r204676) should be run immediately
>> through llvm-profdata to generate format 2 (extension .profdata). The
>> only profiles that should be kept around are format 2.
>
> Okay, but the version comment still applies to format 2 then.
Right. Format 2 needs to be auto-upgraded if we have version changes.
>
>>
>>> As long as I'm bothering to say something, is there some way that the
>>> tools will figure out that you're trying to apply old data to new
>> files
>>> that have changed in ways that make the old data inapplicable? Sorry
>>> if this has been brought up elsewhere and I just missed it.
>>> -paulr
>>
>> There's a hash for each function based on the layout of the counters
>> assigned to it. If the hash from the data doesn't match the current
>> frontend, the data is ignored. Currently, the hash is extremely naive:
>> the number of counters.
>
> Eww. Should be CFG-based. I think merge-similar-functions has a way
> to compute this?
Working on it in "[PATCH] InstrProf: Calculate a better function hash" :).
More information about the llvm-dev
mailing list