[LLVMdev] RFC - Improvements to PGO profile support

Diego Novillo dnovillo at google.com
Thu May 28 11:52:02 PDT 2015


On Thu, May 28, 2015 at 1:10 PM, Dario Domizioli
<dario.domizioli at gmail.com> wrote:
> Hi Diego,
>
> thanks for clarifying the difference between the two formats. I have noticed
> the new note in the "Sample Profile Format" section of the Clang guide
> clarifying that it is different from the coverage format.
>
> So, my further question is... Am I right in understanding that both formats
> can be used for PGO purposes then?
> I have tried the following, as in the Clang user guide:
>
> $ clang++ -O2 -fprofile-instr-generate code.cc -o code
> $ LLVM_PROFILE_FILE="code-%p.profraw" ./code
> $ llvm-profdata merge -output=code.profdata code-*.profraw
> $ clang++ -O2 -fprofile-instr-use=code.profdata code.cc -o code
>
> This produces a PGOptimized executable which performs differently (in fact,
> better!) than a normal O2 build, so I think the "code.profdata" file
> produced by the commands above is valid.
>
> If I look inside "code.profdata" with a text editor, the file is most
> definitely not the ASCII-based sampling profile file format. Now I know that
> this is to be expected because I have used the infrastructure designed for
> coverage to generate the file.
>
> So, if I understand correctly:
> - If you want to do PGO with a sampling profile file that you have somehow
> generated from data collected by an external profiler, then the format must
> be the ASCII text one described in the Clang guide.

Right.  Note that this ASCII text format is just one of the 3 formats
accepted by the sampling profiler.  There is a more compact binary
representation and a (yet unsubmitted) gcov variant that's used by
GCC's sampling profiler.

However, the fundamental difference is still the same. Regardless of
what file format you use for the sampling profiler, that data is not
suitable for coverage. Only the instrumentation generated with
-fprofile-instr-generate can be used for coverage.

> - However you can also use the infrastructure for coverage, and the file
> produced by such infrastructure, as an input to PGO (without caring too much
> about the format at this point, as you don't need to look inside the file).

Well, you never need to care about the format for inspection. All the
formats are read by llvm-profdata. All you need to care is that the
data generated by sampling profilers is not really useful for
coverage. Note that it would be ~trivial to use, but the results would
be awful. Sampling is a pretty lossy approach.

> In which case I would recommend to add a note to the "Profiling with
> Instrumentation" section as well, to state that the format produced by
> "llvm-profdata merge" is not the same as the one detailed just above that
> section.
> I now understand the difference, but I believe a reader who is approaching
> this for the first time could be misinterpreting the guide and they could
> assume the instrumentation approach also produces a sampling profile file in
> the ASCII format.

I agree. Thanks for pointing this out. I'll re-work this section.


Diego.



More information about the llvm-dev mailing list