[PATCH] D36351: [lld][ELF] Add profile guided section layout
Michael Spencer via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 8 16:35:22 PST 2018
On Thu, Feb 8, 2018 at 4:04 PM, Rafael Avila de Espindola <
rafael.espindola at gmail.com> wrote:
> Looking a bit more on why I might not be measuring a performance
> improvement I noticed that the callgraph I was using was missing
> conditional calls. The attached script fixes that.
>
> I have uploaded a new version of the test with the complete call graph
> to https://s3-us-west-2.amazonaws.com/linker-tests/t2.tar.xz.
>
> I also noticed that we were not considering the case of multiple symbols
> in the same section. The attached patch fixes that.
>
This is already handled in `CallGraphSort::CallGraphSort()` here:
NodeIndex From = GetOrCreateNode(FromSB);
NodeIndex To = GetOrCreateNode(ToSB);
Nodes[To].Weight = SaturatingAdd(Nodes[To].Weight, Weight);
if (From == To)
continue;
It's specifically done here after the node weight adjustment so that
density calculation later on takes this into account.
- Michael Spencer
>
> Even with these changes I still get a iTLB regression.
>
> I am now going to try building hfsort and compare its results.
>
> Please upload a new patch on top of tree and include the attached fixes.
>
>
> diff --git a/ELF/Driver.cpp b/ELF/Driver.cpp
> index 1fc9a0ad5..a8d28de4c 100644
> --- a/ELF/Driver.cpp
> +++ b/ELF/Driver.cpp
> @@ -590,8 +590,12 @@ static void readCallGraph(MemoryBufferRef MB) {
> fatal("parse error");
> InputSectionBase *FromSec = SymbolSection.lookup(Fields[0]);
> InputSectionBase *ToSec = SymbolSection.lookup(Fields[1]);
> - if (FromSec && ToSec)
> - Config->CallGraphProfile[std::make_pair(FromSec, ToSec)] = Count;
> + if (FromSec == ToSec)
> + continue;
> + if (FromSec && ToSec) {
> + uint64_t &V = Config->CallGraphProfile[std::make_pair(FromSec,
> ToSec)];
> + V += Count;
> + }
> }
> }
>
>
>
> Thanks,
> Rafael
>
> Michael Spencer <bigcheesegs at gmail.com> writes:
>
> > On Thu, Feb 8, 2018 at 10:41 AM, Rafael Avila de Espindola <
> > rafael.espindola at gmail.com> wrote:
> >
> >> Michael Spencer <bigcheesegs at gmail.com> writes:
> >>
> >> > On Tue, Feb 6, 2018 at 6:53 PM, Rafael Avila de Espindola <
> >> > rafael.espindola at gmail.com> wrote:
> >> >
> >> >> I have benchmarked this by timing lld ltoing FileCheck. The working
> set
> >> >> is much larger this time. The old callgraph had 4079 calls, this one
> has
> >> >> 30616.
> >> >>
> >> >> The results are somewhat similar:
> >> >>
> >> >> Performance counter stats for '../default-ld.lld @response.txt' (10
> >> runs):
> >> >>
> >> >> 498,771 iTLB-load-misses
> >> >> ( +- 0.10% )
> >> >> 224,751,360 L1-icache-load-misses
> >> >> ( +- 0.00% )
> >> >>
> >> >> 2.339864606 seconds time elapsed
> >> >> ( +- 0.06% )
> >> >>
> >> >> Performance counter stats for '../sorted-ld.lld @response.txt' (10
> >> runs):
> >> >>
> >> >> 556,999 iTLB-load-misses
> >> >> ( +- 0.17% )
> >> >> 216,788,838 L1-icache-load-misses
> >> >> ( +- 0.01% )
> >> >>
> >> >> 2.326596163 seconds time elapsed
> >> >> ( +- 0.04% )
> >> >>
> >> >> As with the previous test iTLB gets worse and L1 gets better. The net
> >> >> result is a very small speedup.
> >> >>
> >> >> Do you know how big the chromium call graph is?
> >> >>
> >> >
> >> > Not sure, but the call graph for a high profile internal game I
> tested is
> >> > about 10k functions and 17 MiB of .text, and I got a %2-%4 speedup.
> >> Given
> >> > that it's a game it runs a decent portion of that 17MiB 60 times a
> >> second,
> >> > while llvm is heavily pass based, so I don't expect the instruction
> >> working
> >> > set over a small period of time to be that high.
> >>
> >> One difference from the paper and the script I am using to create the
> >> call graph is that the script I have records every call the exact number
> >> of times. The script is attached.
> >>
> >> With sampling, a call foo->long_running_bar would be recorded multiple
> >> times and show up as multiple calls.
> >>
> >> The first seems better, but I wonder if sampling somehow produces a
> >> better result.
> >>
> >> With instrumentation (which I assume is what you used in the game), you
> >> also get an exact callgraph, no?
> >>
> >
> > You get an exact callgraph minus indirect calls as those currently aren't
> > captured.
> >
> > - Michael Spencer
> >
> >
> >>
> >> >
> >> > I am however surprised by the 10% increase in iTLB misses.
> >>
> >>
> >>
> >>
> >> Cheers,
> >> Rafael
> >>
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180208/b0c5a001/attachment.html>
More information about the llvm-commits
mailing list