[llvm-dev] [RFC] Profile guided section layout

Michael Spencer via llvm-dev llvm-dev at lists.llvm.org
Tue Aug 1 14:25:19 PDT 2017


On Tue, Aug 1, 2017 at 1:57 PM, Justin Bogner <mail at justinbogner.com> wrote:

> Rafael Avila de Espindola via llvm-dev <llvm-dev at lists.llvm.org> writes:
> > Michael Spencer via llvm-dev <llvm-dev at lists.llvm.org> writes:
> >
> >> I've recently implemented profile guided section layout in llvm + lld
> using
> >> the Call-Chain Clustering (C³) heuristic from
> >> https://research.fb.com/wp-content/uploads/2017/01/
> cgo2017-hfsort-final1.pdf
> >> . In the programs I've tested it on I've gotten from 0% to 5%
> performance
> >> improvement over standard PGO with zero cases of slowdowns and up to 15%
> >> reduction in ITLB misses.
> >>
> >>
> >> There are three parts to this implementation.
> >>
> >> The first is a new llvm pass which uses branch frequency info to get
> counts
> >> for each call instruction and then adds a module flags metatdata table
> of
> >> function -> function edges along with their counts.
> >>
> >> The second takes the module flags metadata and writes it into a
> >> .note.llvm.callgraph section in the object file. This currently just
> dumps
> >> it as text, but could save space by reusing the string table.
> >>
> >> The last part is in lld. It reads the .note.llvm.callgraph data from
> each
> >> object file and merges them into a single table. It then builds a call
> >> graph based on the profile data then iteratively merges the hottest call
> >> edges using the C³ heuristic as long as it would not create a cluster
> >> larger than the page size. All clusters are then sorted by a density
> metric
> >> to further improve locality.
> >
> > Since the branch frequency info is in a llvm specific format, it makes
> > sense for llvm to read it instead of expecting lld to do it again. Since
> > .o files is how the compiler talks to the linker, it also makes sense
> > for llvm to record the required information there.
> >
> > In the same way, since the linker is the first place with global
> > knowledge, it makes sense for it to be the one that implements a section
> > ordering heuristic instead of just being told by some other tool, which
> > would complicate the build.
> >
> > However, do we need to start with instrumentation? The original paper
> > uses sampling with good results and current intel cpus can record every
> > branch in a program.
>
> This already works without instrumentation. You can probably try it out
> as is with profiles generated with linux perf using the create_llvm_prof
> tool from the autofdo work: https://github.com/google/autofdo


I'm pretty sure by "start with instrumentation" he means start with the
restrictions it imposes on having to traffic the data through the object
file.

- Michael Spencer


>
>
> > I would propose starting with just an lld patch that reads the call
> > graph from a file. The format would be very similar to what you propose,
> > just weight,caller,callee.
> >
> > In a another patch we can then look at instrumentation: Why it is more
> > convenient for some uses and what performance advantage it might have.
> >
> > I have written a small tool that usesr intel_bts and 'perf script' to
> > construct the callgraph. I am giving it a try with your lld patch and
> > will hopefully post results today.
> >
> > Cheers,
> > Rafael
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170801/67526449/attachment-0001.html>


More information about the llvm-dev mailing list