[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)

Sat Jan 20 12:50:22 PST 2018

Generally speaking a good rule of thumb is that /debug:ghash will be close
to or faster than /debug:fastlink, but with none of the penalties like slow
debug time
On Sat, Jan 20, 2018 at 12:44 PM Zachary Turner <zturner at google.com> wrote:

> Chrome is actually one of my exact benchmark cases. When building
> blink_core.dll and browser_tests.exe, i get anywhere from a 20-40%
> reduction in link time. We have some other optimizations in the pipeline
> but not upstream yet.
>
> My best time so far (including other optimizations not yet upstream) is
> 28s on blink_core.dll, compared to 110s with /debug
> On Sat, Jan 20, 2018 at 12:28 PM Leonardo Santagada <santagada at gmail.com>
> wrote:
>
>> On Sat, Jan 20, 2018 at 9:05 PM, Zachary Turner <zturner at google.com>
>> wrote:
>>
>>> You probably don't want to go down the same route that clang goes
>>> through to write the object file.  If you think yaml2coff is convoluted,
>>> the way clang does it will just give you a headache.  There are multiple
>>> abstractions involved to account for different object file formats (ELF,
>>> COFF, MachO) and output formats (Assembly, binary file).  At least with
>>> yaml2coff
>>>
>>
>> I think your phrase got cut there, but yeah I just found AsmPrinter.cpp
>> and it is convoluted.
>>
>>
>>
>>> It's true that yaml2coff is using the COFFParser structure, but if you
>>> look at the writeCOFF function in yaml2coff it's pretty bare-metal.
>>> The logic you need will be almost identical, except that instead of
>>> checking the COFFParser for the various fields, you'll check the existing
>>> COFFObjectFile, which should have similar fields.
>>>
>>> The only thing you need to different is when writing the section table
>>> and section contents, to insert a new entry.  Since you're injecting a
>>> section into the middle, you'll also probably need to push back the file
>>> pointer of all subsequent sections so that they don't overlap.  (e.g. if
>>> the original sections are 1, 2, 3, 4, 5 and you insert between 2 and 3,
>>> then the original sections 3, 4, and 5 would need to have their
>>> FilePointerToRawData offset by the size of the new section).
>>>
>>
>> I have the PE/COFF spec open here and I'm happy that I read a bit of it
>> so I actually know what you are talking about... yeah it doesn't seem too
>> complicated.
>>
>>
>>
>>> If you need to know what values to put for the other fields in a section
>>> header, run `dumpbin /headers foo.obj` on a clang-generated object file
>>> that has a .debug$H section already (e.g. run clang with
>>> -emit-codeview-ghash-section, and look at the properties of the .debug$H
>>> section and use the same values).
>>>
>>
>> Thanks I will do that and then also look at how the CodeView part of the
>> code does it if I can't understand some of it.
>>
>>
>>> The only invariant that needs to be maintained is that
>>> Section[N]->FilePointerOfRawData == Section[N-1]->FilePointerOfRawData +
>>> Section[N-1]->SizeOfRawData
>>>
>>
>> Well, that and all the sections need to be on the final file... But I'm
>> hopeful.
>>
>>
>> Anyone has times on linking a big project like chrome with this so that
>> at least I know what kind of performance to expect?
>>
>> My numbers are something like:
>>
>> 1 pdb per obj file: link.exe takes ~15 minutes and 16GB of ram,
>> lld-link.exe takes 2:30 minutes and ~8GB of ram
>> around 10 pdbs per folder: link.exe takes 1 minute and 2-3GB of ram,
>> lld-link.exe takes 1:30 minutes and ~6GB of ram
>> faslink: link.exe takes 40 seconds, but then 20 seconds of loading at the
>> first break point in the debugger and we lost DIA support for listing
>> symbols.
>> incremental: link.exe takes 8 seconds, but it only happens when very
>> minor changes happen.
>>
>> We have an non negligible number of symbols used on some runtime systems.
>>
>>
>>>
>>> On Sat, Jan 20, 2018 at 11:52 AM Leonardo Santagada <santagada at gmail.com>
>>> wrote:
>>>
>>>> Thanks for the tips, I now have something that reads the obj file,
>>>> finds .debug$T sections and global hashes it (proof of concept kind of
>>>> code). What I can't find is: how does clang itself writes the coff files
>>>> with global hashes, as that might help me understand how to create the
>>>> .debug$H section, how to update the file section count and how to properly
>>>> write this back.
>>>>
>>>> The code on yaml2coff is expecting to be working on the yaml COFFParser
>>>> struct and I'm having quite a bit of a headache turning the COFFObjectFile
>>>> into a COFFParser object or compatible... Tomorrow I might try the very non
>>>> efficient path of coff2yaml and then yaml2coff with the hashes header...
>>>> but it seems way too inefficient and convoluted.
>>>>
>>>> On Fri, Jan 19, 2018 at 10:38 PM, Zachary Turner <zturner at google.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Jan 19, 2018 at 1:02 PM Leonardo Santagada <
>>>>> santagada at gmail.com> wrote:
>>>>>
>>>>>> On Fri, Jan 19, 2018 at 9:44 PM, Zachary Turner <zturner at google.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 19, 2018 at 12:29 PM Leonardo Santagada <
>>>>>>> santagada at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> No I didn't, I used cl.exe from the visual studio toolchain. What
>>>>>>>> I'm proposing is a tool for processing .obj files in COFF format, reading
>>>>>>>> them and generating the GHASH part.
>>>>>>>>
>>>>>>>> To make our build faster we use hundreds of unity build files
>>>>>>>> (.cpp's with a lot of other .cpp's in them aka munch files) but still have
>>>>>>>> a lot of single .cpp's as well (in total something like 3.4k .obj files).
>>>>>>>>
>>>>>>>> ps: sorry for sending to the wrong list, I was reading about llvm
>>>>>>>> mailing lists and jumped when I saw what I thought was a lld exclusive list.
>>>>>>>>
>>>>>>>
>>>>>>> A tool like this would be useful, yes.  We've talked about it
>>>>>>> internally as well and agreed it would be useful, we just haven't
>>>>>>> prioritized it.  If you're interested in submitting a patch along those
>>>>>>> lines though, I think it would be a good addition.
>>>>>>>
>>>>>>> I'm not sure what the best place for it would be.  llvm-readobj and
>>>>>>> llvm-objdump seem like obvious choices, but they are intended to be
>>>>>>> read-only, so perhaps they wouldn't be a good fit.
>>>>>>>
>>>>>>> llvm-pdbutil is kind of a hodgepodge of everything else related to
>>>>>>> PDBs and symbols, so I wouldn't be opposed to making a new subcommand there
>>>>>>> called "ghash" or something that could process an object file and output a
>>>>>>> new object file with a .debug$H section.
>>>>>>>
>>>>>>> A third option would be to make a new tool for it.
>>>>>>>
>>>>>>> I don't htink it would be that hard to write.  If you're interested
>>>>>>> in trying to make a patch for this, I can offer some guidance on where to
>>>>>>> look in the code.  Otherwise it's something that we'll probably get to, I'm
>>>>>>> just not sure when.
>>>>>>>
>>>>>>>>
>>>>>> I would love to write it and contribute it back, please do tell, I
>>>>>> did find some of the code of ghash in lld, but in fuzzy on the llvm
>>>>>> codeview part of it and never seen llvm-readobj/objdump or llvm-pdbutil,
>>>>>> but I'm not afraid to look :)
>>>>>>
>>>>>>
>>>>>  Luckily all of the important code is hidden behind library calls, and
>>>>> it should already just do the right thing, so I suspect you won't need to
>>>>> know much about CodeView to do this.
>>>>>
>>>>> I think Peter has the right idea about putting this in llvm-objcopy.
>>>>>
>>>>> You can look at one of the existing CopyBinary functions there, which
>>>>> currently only work for ELF, but you can just make a new overload that
>>>>> accepts a COFFObjectFile.
>>>>>
>>>>> I would probably start by iterating over each of the sections
>>>>> (getNumberOfSections / getSectionName) looking for .debug$T and .debug$H
>>>>> sections.
>>>>>
>>>>> If you find a .debug$H section then you can just skip that object
>>>>> file.
>>>>>
>>>>> If you find a .debug$T but not a .debug$H, then basically do the same
>>>>> thing that LLD does in PDBLinker::mergeDebugT  (create a CVTypeArray, and
>>>>> pass it to GloballyHashedType::hashTypes.  That will return an array of
>>>>> hash values.  (the format of .debug$H is the header, followed by the hash
>>>>> values).  Then when you're writing the list of sections, just add in the
>>>>> .debug$H section right after the .debug$T section.
>>>>>
>>>>> Currently llvm-objcopy only writes ELF files, so it would need to be
>>>>> taught to write COFF files.  We have code to do this in the yaml2obj
>>>>> utility (specifically, in yaml2coff.cpp in the function writeCOFF).  There
>>>>> may be a way to move this code to somewhere else (llvm/Object/COFF.h?) so
>>>>> that it can be re-used by both yaml2coff and llvm-objcopy, but in the worst
>>>>> case scenario you could copy the code and re-write it to work with these
>>>>> new structures.
>>>>>
>>>>> Lastly, you'll probably want to put all of this behind an option in
>>>>> llvm-objcopy such as -add-codeview-ghash-section
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Leonardo Santagada
>>>>
>>>
>>
>>
>> --
>>
>> Leonardo Santagada
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180120/0f0eba12/attachment.html>