[LLVMdev] On LLD performance

Thu Mar 12 11:12:28 PDT 2015

On Thu, Mar 12, 2015 at 9:40 AM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Thu, Mar 12, 2015 at 4:52 AM, Stephan Tolksdorf <st at quanttec.com>
> wrote:
>
>>
>> On 11.03.2015 23:02, Rui Ueyama wrote:
>>
>>> An idea to make the resolver faster would be to use a concurrent hash
>>> map to insert new symbols in parallel. Assuming symbols from the same
>>> file don't conflict each other (I think it's a valid assumption), this
>>> can be parallelized. I wanted single-thread performance gain, though.
>>> (Also, concurrent hash maps are not currently available in LLVM.)
>>>
>>
>> Instead of using a concurrent hash map you could fill multiple hash maps
>> in parallel and then merge them afterwards. If the implementation caches
>> the hash values (which it should for string keys), merging maps doesn't
>> require recomputing the hash values and can be a relatively fast operation
>> when there is only a small overlap between the maps.
>>
>
> Yeah - I'd wonder more about the process here - each file has
> non-overlapping items? Are the possibly-overlapping items between files
> know-able? (only certain kinds of symbols) maybe you can use that to your
> advantage in some way? (yeah, I know - super vague)
>

Each file has non-overlapping symbols. We may be able to know
possibly-overlapping symbols between files, but I don't know if we can
exploit that because it's similar to what the linker actually does.

We can model an object file using two sets, U and D, containing names of
undefined symbols and defined symbols, respectively. For each file, U and D
are non-overlapping. Linker has two sets, U' and D', containing name of
undefined symbols need to be resolved, and defined symbols it collected so
far. It reads files until U' is empty. For each file, new undefined symbols
are added or removed by U' = union(U' - D, U - D'). Defined symbols are
added by D' = union(D', D). D' grows as we read more files and U' shrinks.

And there are library files, which gives a linker a choice whether it reads
a member file or not. Linker only reads member files who resolves undefined
symbols.

There are also odd stuffs such as COMDAT groups or
merge-not-by-name-but-by-content sections, that may complicate the model.
(I don't think about that yet.)

It feels to me what linker does is straightforward and it's hard to
improve, but I can't prove. There might be room to improve.

>
>>
>> More generally, if hash map operations make up a large share of the run
>> time, it might be worth looking for situations where hash values are
>> unnecessarily recomputed, e.g. when an item is conditionally inserted or
>> when a hash map is grown or copied. (This is just a general observation, I
>> don't know anything about LLD's implementation.)
>>
>> - Stephan
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150312/a1a53296/attachment.html>