[llvm-dev] LLD: time to enable --threads by default

Wed Nov 23 06:31:52 PST 2016

Interesting. Might be worth giving a try again to the idea of creating
the file in anonymous memory and using a write to output it.

Cheers,
Rafael

On 23 November 2016 at 02:41, Sean Silva via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>
>
> On Wed, Nov 16, 2016 at 12:44 PM, Rui Ueyama via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>>
>> LLD supports multi-threading, and it seems to be working well as you can
>> see in a recent result. In short, LLD runs 30% faster with --threads option
>> and more than 50% faster if you are using --build-id (your mileage may vary
>> depending on your computer). However, I don't think most users even don't
>> know about that because --threads is not a default option.
>>
>> I'm thinking to enable --threads by default. We now have real users, and
>> they'll be happy about the performance boost.
>>
>> Any concerns?
>>
>> I can't think of problems with that, but I want to write a few notes about
>> that:
>>
>>  - We still need to focus on single-thread performance rather than
>> multi-threaded one because it is hard to make a slow program faster just by
>> using more threads.
>>
>>  - We shouldn't do "too clever" things with threads. Currently, we are
>> using multi-threads only at two places where they are highly parallelizable
>> by nature (namely, copying and applying relocations for each input section,
>> and computing build-id hash). We are using parallel_for_each, and that is
>> very simple and easy to understand. I believe this was a right design
>> choice, and I don't think we want to have something like workqueues/tasks in
>> GNU gold, for example.
>
>
> Sorry for the late response.
>
> Copying and applying relocations is actually are not as parallelizable as
> you would imagine in current LLD. The reason is that there is an implicit
> serialization when mutating the kernel's VA map (which happens any time
> there is a minor page fault, i.e. the first time you touch a page of an
> mmap'd input). Since threads share the same VA, there is an implicit
> serialization across them. Separate processes are needed to avoid this
> overhead (note that the separate processes would still have the same output
> file mapped; so (at least with fixed partitioning) there is no need for
> complex IPC).
>
> For `ld.lld -O0` on Mac host, I measured <1GB/s copying speed, even though
> the machine I was running on had like 50 GB/s DRAM bandwidth; so the VA
> overhead is on the order of a 50x slowdown for this copying operation in
> this extreme case, so Amdahl's law indicates that there will be practically
> no speedup for this copy operation by adding multiple threads. I've also
> DTrace'd this to see massive contention on the VA lock. LInux will be better
> but no matter how good, it is still a serialization point and Amdahl's law
> will limit your speedup significantly.
>
> -- Sean Silva
>
>>
>>
>>  - Run benchmarks with --no-threads if you are not focusing on
>> multi-thread performance.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>