[LLVMdev] RFC - Improvements to PGO profile support

Dario Domizioli dario.domizioli at gmail.com
Thu May 28 10:10:02 PDT 2015


Hi Diego,

thanks for clarifying the difference between the two formats. I have
noticed the new note in the "Sample Profile Format" section of the Clang
guide clarifying that it is different from the coverage format.

So, my further question is... Am I right in understanding that both formats
can be used for PGO purposes then?
I have tried the following, as in the Clang user guide:

$ clang++ -O2 -fprofile-instr-generate code.cc -o code
$ LLVM_PROFILE_FILE="code-%p.profraw" ./code
$ llvm-profdata merge -output=code.profdata code-*.profraw
$ clang++ -O2 -fprofile-instr-use=code.profdata code.cc -o code

This produces a PGOptimized executable which performs differently (in fact,
better!) than a normal O2 build, so I think the "code.profdata" file
produced by the commands above is valid.

If I look inside "code.profdata" with a text editor, the file is most
definitely not the ASCII-based sampling profile file format. Now I know
that this is to be expected because I have used the infrastructure designed
for coverage to generate the file.

So, if I understand correctly:
- If you want to do PGO with a sampling profile file that you have somehow
generated from data collected by an external profiler, then the format must
be the ASCII text one described in the Clang guide.
- However you can also use the infrastructure for coverage, and the file
produced by such infrastructure, as an input to PGO (without caring too
much about the format at this point, as you don't need to look inside the
file).

Is my understanding correct?
In which case I would recommend to add a note to the "Profiling with
Instrumentation" section as well, to state that the format produced by
"llvm-profdata merge" is not the same as the one detailed just above that
section.
I now understand the difference, but I believe a reader who is approaching
this for the first time could be misinterpreting the guide and they could
assume the instrumentation approach also produces a sampling profile file
in the ASCII format.

Cheers,
    Dario Domizioli
    SN Systems - Sony Computer Entertainment Group




On 22 May 2015 at 16:57, Diego Novillo <dnovillo at google.com> wrote:

> On Fri, May 22, 2015 at 11:16 AM, Dario Domizioli
> <dario.domizioli at gmail.com> wrote:
> > Hi all,
> >
> > I am a bit confused about the documentation of the format of the profile
> > data file.
> >
> > The Clang user guide here describes it as an ASCII text file:
> > http://clang.llvm.org/docs/UsersManual.html#sample-profile-format
> >
> > Whereas the posts above and the referenced link describe it as a stream
> of
> > bytes containing LEB128s:
> > http://www.llvm.org/docs/CoverageMappingFormat.html
> >
> > From experimenting with the latest trunk I can see the latter is correct
> > (well, at least the file I get is not ASCII text).
> > Should we update the Clang user guide documentation?
> > Or am I just getting confused? Are there two formats, one used for
> coverage
> > and one used for PGO?
>
> You are looking at two unrelated formats. The first URL describes the
> sampling profiling format. That is not used for coverage, only
> optimization.
>
> There are two main profilers in LLVM. The sampling profiler uses
> external profilers (e.g., Linux Perf) to produce sample information
> that is then matched to the user code.  There is no option to use the
> sampling profiler for coverage (it would be a very poor match).
>
> The instrumentation profiler causes Clang to inject tracking code into
> the user program. This is the one used for coverage. If you are
> interested in coverage, you should read the second URL.
>
> I will clarify the documentation for sampling profiles.
>
>
> Diego.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150528/d26fcce8/attachment.html>


More information about the llvm-dev mailing list