[PATCH] D20645: Avoid doing binary search.

Wed May 25 16:07:38 PDT 2016

silvas added a comment.

In http://reviews.llvm.org/D20645#440099, @ruiu wrote:

> I was surprised you said 5% overall improvement isn't worth it. The number seems really significant to me. It is not a microbenchmark

It is. It is basically a microbenchmark of an ad-hoc std::upper_bound-based associative container. This patch adds a hash table in front and makes this be only 20% of the total link time instead of 25% on one data set that you looked at. It's a band-aid and adds complexity which makes it more difficult to actually optimize the linker long-term.

Also, it punts on threads, so it doesn't stack with any gains we get there (hence not a long-term solution).

Have you tried just completely eliminating this code? E.g. creating separate "input sections" for each of the elements in the mergeable input section? Think about it like this: this code is currently 25% of the link time. Are there other solutions that are less than 25% of the link time? I think it is reasonable to expect to find a solution that only costs us 5%.

> but is a real result of linking clang. Just +41 lines improves the overall link time by 5%? To me, it is between "excellent" and "it's too good to be true so I'll apply this and verify it myself."

> 

> You can see the drop of number of branch mispredictions in the perf numbers.

This kind of thinking will get you stuck in a local optimum. Think of the future and how much faster LLD will be than the current LLD. Is this patch part of that future? How much faster do you want LLD to be? How much faster can it be?

Note: on windows by default (and also trivially on tmpfs; relevant to google's internal build system) LLD does not have to wait on writing to a disk. LLD has a long way to go until it is even heavily bottlenecked by disk IO; let alone being limited by DRAM bandwidth (e.g. 50GB/s on a typical low-end machine).

http://reviews.llvm.org/D20645