[PATCH] D20645: Avoid doing binary search.
Rui Ueyama via llvm-commits
llvm-commits at lists.llvm.org
Wed May 25 16:37:20 PDT 2016
On Wed, May 25, 2016 at 4:07 PM, Sean Silva <chisophugis at gmail.com> wrote:
> silvas added a comment.
> In http://reviews.llvm.org/D20645#440099, @ruiu wrote:
> > I was surprised you said 5% overall improvement isn't worth it. The
> number seems really significant to me. It is not a microbenchmark
> It is. It is basically a microbenchmark of an ad-hoc
> std::upper_bound-based associative container. This patch adds a hash table
> in front and makes this be only 20% of the total link time instead of 25%
> on one data set that you looked at. It's a band-aid and adds complexity
> which makes it more difficult to actually optimize the linker long-term.
It is not. Microbenchmarks are benchmarks that exercise only the code that
we are interested in. I didn't make object files with lots of mergeable
sections to make up the number. This is a real patch that cuts the overall
link time by 5% for a general case by adding 41 lines of simple code.
The cost of calculating offsets cannot be zero. The function is hot because
it is one of the most frequently called function in LLD. Obtaining an
output offset is needed to apply relocations to get locations in output to
know where they need to be applied to. For relocations that refer merged
strings, it needs one more function call for each relocation.
Did you know the number of relocations? It's huge. Anything that are called
for each relocation is very expensive.
Offset computation for mergeable sections cannot be an addition unlike
normal sections. It is inherently more expensive because of the fact that
section contents is not contiguous.
So 5% improvement is a pretty good one. And this patch is not complex nor
tricky. You misunderstood that point.
Also, it punts on threads, so it doesn't stack with any gains we get there
> (hence not a long-term solution).
It doesn't punt on threads -- you misunderstood it. It works fine with and
without multi-threading. The code you are looking at is a minor path which
is not executed for most function calls, and in the minor path, we do
caching "just in case" if there's no other threads. It is a minor
optimization so it doesn't affect the number that much, but it wouldn't
hurt either, so it's still nice to have.
Speaking of threads, this patch opens up an opportunity to utilize more
threads. Now that we compute offsets for section pieces all at once instead
of computing them as-needed basis (they will be needed eventually.)
Previously, we cannot parallelize offset computation, but now we can do it
per-input-section basis just by rewriting the for loop with a parallel_for.
Have you tried just completely eliminating this code? E.g. creating
> separate "input sections" for each of the elements in the mergeable input
> section? Think about it like this: this code is currently 25% of the link
> time. Are there other solutions that are less than 25% of the link time? I
> think it is reasonable to expect to find a solution that only costs us 5%.
> > but is a real result of linking clang. Just +41 lines improves the
> overall link time by 5%? To me, it is between "excellent" and "it's too
> good to be true so I'll apply this and verify it myself."
> > You can see the drop of number of branch mispredictions in the perf
> This kind of thinking will get you stuck in a local optimum. Think of the
> future and how much faster LLD will be than the current LLD. Is this patch
> part of that future? How much faster do you want LLD to be? How much faster
> can it be?
> Note: on windows by default (and also trivially on tmpfs; relevant to
> google's internal build system) LLD does not have to wait on writing to a
> disk. LLD has a long way to go until it is even heavily bottlenecked by
> disk IO; let alone being limited by DRAM bandwidth (e.g. 50GB/s on a
> typical low-end machine).
If you come up with an alternative algorithm to make it faster, please
suggest. But in any case this doesn't hurt any future improvements.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits