[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)

Leonardo Santagada via llvm-dev llvm-dev at lists.llvm.org
Mon Jan 29 10:47:48 PST 2018


Yeah true, is there any switches to profile the linker?

On 29 Jan 2018 18:43, "Zachary Turner" <zturner at google.com> wrote:

> Part of the reason why lld is so fast is because we map every input file
> into memory up front and rely on the virtual memory manager in the kernel
> to make this fast.  Generally speaking, this is a lot faster than opening a
> file, reading it and processing a file, and closing the file.  The
> downside, as you note, is that it uses a lot of memory.
>
> But there's a catch.  The kernel is smart enough to share the physical
> memory pages when you map the same file multiple times from multiple
> processes.  So it only looks like the memory usage is high because it
> reserves a large amount of address space in each process.  But the total
> amount of physical memory used will not increase when additional instances
> of the same file are mapped.
>
> On Mon, Jan 29, 2018 at 9:24 AM Leonardo Santagada <santagada at gmail.com>
> wrote:
>
>>
>> I cleaned up my tests and figured that the obj file generated with
>> problems was only with msvc 2015, so trying again with msvc 2017 I get:
>>
>> lld-link: 4s
>> lld-link /debug: 1m30s and ~20gb of ram
>> lld-link /debug:ghash: 59s and ~20gb of ram
>> link: 13s
>> link /debug:fastlink: 43s and 1gb of ram
>> link specialpdb: 1m10s and 4gb of ram
>> link /debug: 9m16s min and >14gb of ram
>>
>> link incremental: 8s when it works.
>>
>>
>> *specialpdb is created with passing to a set of compilation units (eg a
>> folder) the same pdb to be written to, so it dedups the symbols before the
>> final linking, but that does decrease the concurrency as this step can't be
>> done after linking.
>>
>>
>> My question is, in the set of patches you guys haven't upstreamed is
>> there anything that makes compilation uses less memory? Or just asking more
>> directly, when will those patches make to upstream, or can I try them? The
>> memory usage of lld-link is a little worrying as we have around 6-8
>> binaries that we link for windows and they mostly use the same libraries so
>> 20gb of ram each means we probably can't link them all together anymore.
>>
>>
>> Tomorrow I will send my tool and changes to lld so more people can try
>> this out and tell if it helps with their msvc only code.
>>
>>
>> On Sun, Jan 28, 2018 at 11:22 PM, Zachary Turner <zturner at google.com>
>> wrote:
>>
>>> I don’t have pgo numbers. When I build using -flto=thin the link time is
>>> significantly faster than msvc /ltcg and runtime is slightly faster, but I
>>> haven’t tested on a large variety of different workloads, so YMMV. Link
>>> time will definitely be faster though
>>> On Sun, Jan 28, 2018 at 2:20 PM Leonardo Santagada <santagada at gmail.com>
>>> wrote:
>>>
>>>> This part is only for objects with /Z7 debug information in them right?
>>>> I think most of the third parties are either: .lib/obj without debug
>>>> information, the same with information on pdb files. Rewriting all
>>>> .lib/.obj with /Z7 information seems doable with a small python script, the
>>>> pdb one is going to be more work, but I always wanted to know how a pdb
>>>> file is structured so "fun" times ahead. But yeah printing it out, and
>>>> timing it might be very useful indeed.
>>>>
>>>> Did anyone tried to compile/link lld-link.exe with LTO+PGO to see how
>>>> much faster can it get? I might try that as well, as 10% speed improvement
>>>> might be handy.
>>>>
>>>> On Sun, Jan 28, 2018 at 11:14 PM, Zachary Turner <zturner at google.com>
>>>> wrote:
>>>>
>>>>> Look for this code in lld/coff/pdb.cpp
>>>>>
>>>>>
>>>>> if (Config->DebugGHashes) {
>>>>> ArrayRef<GloballyHashedType> Hashes;
>>>>> std::vector<GloballyHashedType> OwnedHashes;
>>>>> if (Optional<ArrayRef<uint8_t>> DebugH = getDebugH(File))
>>>>> Hashes = getHashesFromDebugH(*DebugH);
>>>>> else {
>>>>> OwnedHashes = GloballyHashedType::hashTypes(Types);
>>>>> Hashes = OwnedHashes;
>>>>> }
>>>>>
>>>>> In the else block there, add a log message that says “synthesizing
>>>>> .debug$h section for “ + Obj->Name
>>>>>
>>>>> See how many of these you get. When I build chrome + all third party
>>>>> libraries this way i get about 100, which is small enough to still see
>>>>> large performance gains.
>>>>>
>>>>> If you have many 3rd party libraries, it may be necessary to rewrite
>>>>> the .lib files too, not just the .obj files. Eventually I’ll get around to
>>>>> implementing all of this as well, as well as better heuristics in lld-link
>>>>> to disable ghash if it’s going to be slow
>>>>> On Sun, Jan 28, 2018 at 1:51 PM Leonardo Santagada <
>>>>> santagada at gmail.com> wrote:
>>>>>
>>>>>> Ok I went for kind of middle ground solution, I patch in the obj
>>>>>> files, but as adding a new section didn't seem to work, I add a "shadow"
>>>>>> section, by editing the pointer to line number and the virtual size on the
>>>>>> .debug$T section. Although technically broken, both link.exe and
>>>>>> lld-link.exe don't seem to mind the alterations and as the shadow .debug$H
>>>>>> is not really a section anymore (its just some bytes at the end of the
>>>>>> file) it doesn't change anything else that does matter. With that I could
>>>>>> do my first test with a subset of our code base, and the results are not
>>>>>> good. I found one of our sources that break the ghash computation, I will
>>>>>> get more info on this and post a proper bug report, but I guess its type
>>>>>> information that is generated only by msvc. The other more alarming problem
>>>>>> is that linking is way slower with the ghahes... my guess is that we have a
>>>>>> bunch of pdb files for some third party libraries and calculating those
>>>>>> ghashes takes more time than actual linking of this small part of the
>>>>>> source (it links in 4s in both link.exe and lld-link.exe without ghashes).
>>>>>>
>>>>>> On Fri, Jan 26, 2018 at 8:52 PM, Leonardo Santagada <
>>>>>> santagada at gmail.com> wrote:
>>>>>>
>>>>>>> We don't generate any .lib as those don't work well with incremental
>>>>>>> linking (and give zero advantages when linking AFAIK), and it would be
>>>>>>> pretty easy to have a modern format for having a .ghash for multiple files,
>>>>>>> something simple like size prefixed name and then size prefixed ghash blobs.
>>>>>>>
>>>>>>> On Fri, Jan 26, 2018 at 8:44 PM, Zachary Turner <zturner at google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> We considered that early on, but most object files actually end up
>>>>>>>> in .lib files so unless there were a way to connect the objects in the .lib
>>>>>>>> to the corresponding .ghash files, this would disable ghash usage for a
>>>>>>>> large amount of inputs. Supporting both is an option, but it adds a bit of
>>>>>>>> complexity and I’m not totally convinced it’s worth it
>>>>>>>>
>>>>>>>> On Fri, Jan 26, 2018 at 11:38 AM Leonardo Santagada <
>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> it does.
>>>>>>>>>
>>>>>>>>> I just had an epiphany: why not just write a .ghash file and have
>>>>>>>>> lld read those if they exist for an .obj file?
>>>>>>>>>
>>>>>>>>> Seem much simpler than trying to wire up a 20 year old file
>>>>>>>>> format. I will try to do this, is something like this acceptable for LLD?
>>>>>>>>> The cool thing is that I can generate .ghash for .lib or any obj lying
>>>>>>>>> around (maybe even for pdb in the future).
>>>>>>>>>
>>>>>>>>> On Fri, Jan 26, 2018 at 8:32 PM, Zachary Turner <
>>>>>>>>> zturner at google.com> wrote:
>>>>>>>>>
>>>>>>>>>> In general, we should be able to accept any MSVC .obj file to
>>>>>>>>>> LLD.  At the very least, we're not aware of any cases that don't work.
>>>>>>>>>>
>>>>>>>>>> Does your MSVC .obj file link fine before you add the .debug$H?
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 26, 2018 at 11:23 AM Leonardo Santagada <
>>>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Okay, apparently coff2yaml and yaml2coff are not in a great
>>>>>>>>>>> place as they both don't deal well with the fact that you can have
>>>>>>>>>>> overlapping sections, which seems to be what clang-cl produces (the .data
>>>>>>>>>>> section points to the same place as a later section). Which is not a big
>>>>>>>>>>> big problem for me particularly because msvc doesn't even generate .data
>>>>>>>>>>> sections in .obj.
>>>>>>>>>>>
>>>>>>>>>>> I'm trying to put support for .bss sections in both coff2yaml
>>>>>>>>>>> and yaml2coff... but I still can link just fine with my transformations
>>>>>>>>>>> clang-cl generated files... what does give me problems is msvc .obj files.
>>>>>>>>>>> Have you tried to link one of these?
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jan 26, 2018 at 8:05 PM, Leonardo Santagada <
>>>>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> yeah, apparently .bss has a flag of unitialized data that is
>>>>>>>>>>>> not being respected on the layout of the coff files (it should skip those
>>>>>>>>>>>> sections) but I dunno what to do with .data as it doesn't have a size.
>>>>>>>>>>>>
>>>>>>>>>>>> (resending as apparently my pastes generated a ton of hidden
>>>>>>>>>>>> html data and this message hit the mailinglist limit of 100k)
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Leonardo Santagada
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Leonardo Santagada
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Leonardo Santagada
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Leonardo Santagada
>>>>
>>>
>>
>>
>> --
>>
>> Leonardo Santagada
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180129/f373068a/attachment-0001.html>


More information about the llvm-dev mailing list