[lld] r288606 - Add comments about the use of threads in LLD.

Sat Dec 3 18:45:00 PST 2016

On Sat, Dec 3, 2016 at 4:38 PM, Meador Inge <meadori at gmail.com> wrote:

> On Sat, Dec 3, 2016 at 5:35 PM, Rui Ueyama via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
> Modified: lld/trunk/ELF/Threads.h
>> URL: http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/Threads.h?
>> rev=288606&r1=288605&r2=288606&view=diff
>> ============================================================
>> ==================
>> --- lld/trunk/ELF/Threads.h (original)
>> +++ lld/trunk/ELF/Threads.h Sat Dec  3 17:35:22 2016
>> @@ -6,6 +6,54 @@
>>  // License. See LICENSE.TXT for details.
>>  //
>>  //===------------------------------------------------------
>> ----------------===//
>>
>
> Great comment...
>
>
>> +//
>> +// LLD supports threads to distribute workloads to multiple cores. Using
>> +// multicore is most effective when more than one core are idle. At the
>> +// last step of a build, it is often the case that a linker is the only
>> +// active process on a computer. So, we are naturally interested in using
>> +// threads wisely to reduce latency to deliver results to users.
>> +//
>> +// That said, we don't want to do "too clever" things using threads.
>> +// Complex multi-threaded algorithms are sometimes extremely hard to
>> +// justify the correctness and can easily mess up the entire design.
>> +//
>> +// Fortunately, when a linker links large programs (when the link time is
>> +// most critical), it spends most of the time to work on massive number
>> of
>> +// small pieces of data of the same kind. Here are examples:
>> +//
>> +//  - We have hundreds of thousands of input sections that need to be
>> +//    copied to a result file at the last step of link. Once we fix a
>> file
>> +//    layout, each section can be copied to its destination and its
>> +//    relocations can be applied independently.
>> +//
>> +//  - We have tens of millions of small strings when constructing a
>> +//    mergeable string section.
>> +//
>> +// For the cases such as the former, we can just use parallel_for_each
>> +// instead of std::for_each (or a plain for loop). Because tasks are
>> +// completely independent from each other, we can run them in parallel
>> +// without any coordination between them. That's very easy to understand
>> +// and justify.
>> +//
>> +// For the cases such as the latter, we can use parallel algorithms to
>> +// deal with massive data. We have to write code for a tailored algorithm
>> +// for each problem, but the complexity of multi-threading is isolated in
>> +// a single pass and doesn't affect the linker's overall design.
>> +//
>> +// The above approach seems to be working fairly well. As an example,
>> when
>> +// linking Chromium (output size 1.6 GB), using 4 cores reduces latency
>> to
>> +// 75% compared to single core (from 12.66 seconds to 9.55 seconds) on my
>> +// machine. Using 40 cores reduces it to 63% (from 12.66 seconds to 7.95
>> +// seconds). Because of the Amdahl's law, the speedup is not linear, but
>> +// as you add more cores, it gets faster.
>>
>
> ... one very minor nit here: "my machine" doesn't mean much in a shared
> code base :-)
>

That's right :) Addressed in r288609.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161203/bcde9beb/attachment.html>