[lld] r288606 - Add comments about the use of threads in LLD.

Sun Dec 4 14:03:30 PST 2016

On Sun, Dec 4, 2016 at 7:09 AM, Rui Ueyama <ruiu at google.com> wrote:

> On Sun, Dec 4, 2016 at 1:55 AM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Sat, Dec 3, 2016 at 3:35 PM, Rui Ueyama via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> Author: ruiu
>>> Date: Sat Dec  3 17:35:22 2016
>>> New Revision: 288606
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=288606&view=rev
>>> Log:
>>> Add comments about the use of threads in LLD.
>>>
>>> Modified:
>>>     lld/trunk/ELF/Threads.h
>>>
>>> Modified: lld/trunk/ELF/Threads.h
>>> URL: http://llvm.org/viewvc/llvm-project/lld/trunk/ELF/Threads.h?
>>> rev=288606&r1=288605&r2=288606&view=diff
>>> ============================================================
>>> ==================
>>> --- lld/trunk/ELF/Threads.h (original)
>>> +++ lld/trunk/ELF/Threads.h Sat Dec  3 17:35:22 2016
>>> @@ -6,6 +6,54 @@
>>>  // License. See LICENSE.TXT for details.
>>>  //
>>>  //===------------------------------------------------------
>>> ----------------===//
>>> +//
>>> +// LLD supports threads to distribute workloads to multiple cores. Using
>>> +// multicore is most effective when more than one core are idle. At the
>>> +// last step of a build, it is often the case that a linker is the only
>>> +// active process on a computer. So, we are naturally interested in
>>> using
>>> +// threads wisely to reduce latency to deliver results to users.
>>> +//
>>> +// That said, we don't want to do "too clever" things using threads.
>>> +// Complex multi-threaded algorithms are sometimes extremely hard to
>>> +// justify the correctness and can easily mess up the entire design.
>>> +//
>>> +// Fortunately, when a linker links large programs (when the link time
>>> is
>>> +// most critical), it spends most of the time to work on massive number
>>> of
>>> +// small pieces of data of the same kind. Here are examples:
>>> +//
>>> +//  - We have hundreds of thousands of input sections that need to be
>>> +//    copied to a result file at the last step of link. Once we fix a
>>> file
>>> +//    layout, each section can be copied to its destination and its
>>> +//    relocations can be applied independently.
>>> +//
>>> +//  - We have tens of millions of small strings when constructing a
>>> +//    mergeable string section.
>>> +//
>>> +// For the cases such as the former, we can just use parallel_for_each
>>> +// instead of std::for_each (or a plain for loop). Because tasks are
>>> +// completely independent from each other, we can run them in parallel
>>> +// without any coordination between them. That's very easy to understand
>>> +// and justify.
>>> +//
>>> +// For the cases such as the latter, we can use parallel algorithms to
>>> +// deal with massive data. We have to write code for a tailored
>>> algorithm
>>> +// for each problem, but the complexity of multi-threading is isolated
>>> in
>>> +// a single pass and doesn't affect the linker's overall design.
>>> +//
>>> +// The above approach seems to be working fairly well. As an example,
>>> when
>>> +// linking Chromium (output size 1.6 GB), using 4 cores reduces latency
>>> to
>>> +// 75% compared to single core (from 12.66 seconds to 9.55 seconds) on
>>> my
>>> +// machine. Using 40 cores reduces it to 63% (from 12.66 seconds to 7.95
>>> +// seconds). Because of the Amdahl's law, the speedup is not linear, but
>>> +// as you add more cores, it gets faster.
>>> +//
>>> +// On a final note, if you are trying to optimize, keep the axiom "don't
>>> +// guess, measure!" in mind. Some important passes of the linker are not
>>> +// that slow. For example, resolving all symbols is not a very heavy
>>> pass,
>>> +// although it would be very hard to parallelize it. You want to first
>>> +// identify a slow pass and then optimize it.
>>>
>>
>> Actually, LLD's symbol resolution (the approach with Lazy symbols for
>> archives) is a perfect example of a MapReduce type problem, so it is
>> actually quite parallelizable.
>> You basically have a huge number of (SymbolName,SymbolValue) pairs and
>> you want to coalesce all values with the same SymbolName into pairs
>> (SymbolName, [SymbolValue1,SymbolValue2,...]) which you can then process
>> all the SymbolValueN's to see which is the real definition. This is
>> precisely the problem that MapReduce solves.
>>
>
> How do you handle static archives?
>

LLD's archive semantics insert lazy symbols for all the archive members, so
it isn't a problem.

-- Sean Silva

>
>
>>
>> (note: I don't necessarily mean that it needs to be done in a distributed
>> fashion, just that the core problem is really one of coalescing values with
>> the same keys.
>> )
>>
>> MapReduce's core abstraction is also a good tool for deduplicating
>> strings.
>>
>>
>> Richard Smith and I were actually brainstorming at the latest llvm social
>> a distributed linker may be a good fit for the linking problem at Google
>> (but it was just brainstorming; obviously that would be a huge effort and
>> we would need very serious justification before embarking on that effort).
>>
>> -- Sean Silva
>>
>>
>>> +//
>>> +//===------------------------------------------------------
>>> ----------------===//
>>>
>>>  #ifndef LLD_ELF_THREADS_H
>>>  #define LLD_ELF_THREADS_H
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161204/71693e39/attachment.html>