[llvm-dev] LLD: time to enable --threads by default

Tue Nov 22 23:41:04 PST 2016

On Wed, Nov 16, 2016 at 12:44 PM, Rui Ueyama via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> LLD supports multi-threading, and it seems to be working well as you can
> see in a recent result
> <http://llvm.org/viewvc/llvm-project?view=revision&revision=287140>. In
> short, LLD runs 30% faster with --threads option and more than 50% faster
> if you are using --build-id (your mileage may vary depending on your
> computer). However, I don't think most users even don't know about that
> because --threads is not a default option.
>
> I'm thinking to enable --threads by default. We now have real users, and
> they'll be happy about the performance boost.
>
> Any concerns?
>
> I can't think of problems with that, but I want to write a few notes about
> that:
>
>  - We still need to focus on single-thread performance rather than
> multi-threaded one because it is hard to make a slow program faster just by
> using more threads.
>
>  - We shouldn't do "too clever" things with threads. Currently, we are
> using multi-threads only at two places where they are highly parallelizable
> by nature (namely, copying and applying relocations for each input section,
> and computing build-id hash). We are using parallel_for_each, and that is
> very simple and easy to understand. I believe this was a right design
> choice, and I don't think we want to have something like workqueues/tasks
> in GNU gold, for example.
>

Sorry for the late response.

Copying and applying relocations is actually are not as parallelizable as
you would imagine in current LLD. The reason is that there is an implicit
serialization when mutating the kernel's VA map (which happens any time
there is a minor page fault, i.e. the first time you touch a page of an
mmap'd input). Since threads share the same VA, there is an implicit
serialization across them. Separate processes are needed to avoid this
overhead (note that the separate processes would still have the same output
file mapped; so (at least with fixed partitioning) there is no need for
complex IPC).

For `ld.lld -O0` on Mac host, I measured <1GB/s copying speed, even though
the machine I was running on had like 50 GB/s DRAM bandwidth; so the VA
overhead is on the order of a 50x slowdown for this copying operation in
this extreme case, so Amdahl's law indicates that there will be practically
no speedup for this copy operation by adding multiple threads. I've also
DTrace'd this to see massive contention on the VA lock. LInux will be
better but no matter how good, it is still a serialization point and
Amdahl's law will limit your speedup significantly.

-- Sean Silva

>
>  - Run benchmarks with --no-threads if you are not focusing on
> multi-thread performance.
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161122/f5fe3715/attachment.html>