[PATCH] D36351: [lld][ELF] Add profile guided section layout

Thu Feb 8 11:39:47 PST 2018

On Thu, Feb 8, 2018 at 10:41 AM, Rafael Avila de Espindola <
rafael.espindola at gmail.com> wrote:

> Michael Spencer <bigcheesegs at gmail.com> writes:
>
> > On Tue, Feb 6, 2018 at 6:53 PM, Rafael Avila de Espindola <
> > rafael.espindola at gmail.com> wrote:
> >
> >> I have benchmarked this by timing lld ltoing FileCheck. The working set
> >> is much larger this time. The old callgraph had 4079 calls, this one has
> >> 30616.
> >>
> >> The results are somewhat similar:
> >>
> >>  Performance counter stats for '../default-ld.lld @response.txt' (10
> runs):
> >>
> >>            498,771      iTLB-load-misses
> >>             ( +-  0.10% )
> >>        224,751,360      L1-icache-load-misses
> >>            ( +-  0.00% )
> >>
> >>        2.339864606 seconds time elapsed
> >>       ( +-  0.06% )
> >>
> >>  Performance counter stats for '../sorted-ld.lld @response.txt' (10
> runs):
> >>
> >>            556,999      iTLB-load-misses
> >>             ( +-  0.17% )
> >>        216,788,838      L1-icache-load-misses
> >>            ( +-  0.01% )
> >>
> >>        2.326596163 seconds time elapsed
> >>       ( +-  0.04% )
> >>
> >> As with the previous test iTLB gets worse and L1 gets better. The net
> >> result is a very small speedup.
> >>
> >> Do you know how big the chromium call graph is?
> >>
> >
> > Not sure, but the call graph for a high profile internal game I tested is
> > about 10k functions and 17 MiB of .text, and I got a %2-%4 speedup.
> Given
> > that it's a game it runs a decent portion of that 17MiB 60 times a
> second,
> > while llvm is heavily pass based, so I don't expect the instruction
> working
> > set over a small period of time to be that high.
>
> One difference from the paper and the script I am using to create the
> call graph is that the script I have records every call the exact number
> of times. The script is attached.
>
> With sampling, a call foo->long_running_bar would be recorded multiple
> times and show up as multiple calls.
>
> The first seems better, but I wonder if sampling somehow produces a
> better result.
>
> With instrumentation (which I assume is what you used in the game), you
> also get an exact callgraph, no?
>

You get an exact callgraph minus indirect calls as those currently aren't
captured.

- Michael Spencer

>
> >
> > I am however surprised by the 10% increase in iTLB misses.
>
>
>
>
> Cheers,
> Rafael
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180208/9bf886df/attachment.html>