[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)

Leonardo Santagada via llvm-dev llvm-dev at lists.llvm.org
Sat Jan 20 12:28:53 PST 2018


On Sat, Jan 20, 2018 at 9:05 PM, Zachary Turner <zturner at google.com> wrote:

> You probably don't want to go down the same route that clang goes through
> to write the object file.  If you think yaml2coff is convoluted, the way
> clang does it will just give you a headache.  There are multiple
> abstractions involved to account for different object file formats (ELF,
> COFF, MachO) and output formats (Assembly, binary file).  At least with
> yaml2coff
>

I think your phrase got cut there, but yeah I just found AsmPrinter.cpp and
it is convoluted.



> It's true that yaml2coff is using the COFFParser structure, but if you
> look at the writeCOFF function in yaml2coff it's pretty bare-metal.  The
> logic you need will be almost identical, except that instead of checking
> the COFFParser for the various fields, you'll check the existing
> COFFObjectFile, which should have similar fields.
>
> The only thing you need to different is when writing the section table and
> section contents, to insert a new entry.  Since you're injecting a
> section into the middle, you'll also probably need to push back the file
> pointer of all subsequent sections so that they don't overlap.  (e.g. if
> the original sections are 1, 2, 3, 4, 5 and you insert between 2 and 3,
> then the original sections 3, 4, and 5 would need to have their
> FilePointerToRawData offset by the size of the new section).
>

I have the PE/COFF spec open here and I'm happy that I read a bit of it so
I actually know what you are talking about... yeah it doesn't seem too
complicated.



> If you need to know what values to put for the other fields in a section
> header, run `dumpbin /headers foo.obj` on a clang-generated object file
> that has a .debug$H section already (e.g. run clang with
> -emit-codeview-ghash-section, and look at the properties of the .debug$H
> section and use the same values).
>

Thanks I will do that and then also look at how the CodeView part of the
code does it if I can't understand some of it.


> The only invariant that needs to be maintained is that Section[N]->FilePointerOfRawData ==
> Section[N-1]->FilePointerOfRawData + Section[N-1]->SizeOfRawData
>

Well, that and all the sections need to be on the final file... But I'm
hopeful.


Anyone has times on linking a big project like chrome with this so that at
least I know what kind of performance to expect?

My numbers are something like:

1 pdb per obj file: link.exe takes ~15 minutes and 16GB of ram,
lld-link.exe takes 2:30 minutes and ~8GB of ram
around 10 pdbs per folder: link.exe takes 1 minute and 2-3GB of ram,
lld-link.exe takes 1:30 minutes and ~6GB of ram
faslink: link.exe takes 40 seconds, but then 20 seconds of loading at the
first break point in the debugger and we lost DIA support for listing
symbols.
incremental: link.exe takes 8 seconds, but it only happens when very minor
changes happen.

We have an non negligible number of symbols used on some runtime systems.


>
> On Sat, Jan 20, 2018 at 11:52 AM Leonardo Santagada <santagada at gmail.com>
> wrote:
>
>> Thanks for the tips, I now have something that reads the obj file, finds
>> .debug$T sections and global hashes it (proof of concept kind of code).
>> What I can't find is: how does clang itself writes the coff files with
>> global hashes, as that might help me understand how to create the .debug$H
>> section, how to update the file section count and how to properly write
>> this back.
>>
>> The code on yaml2coff is expecting to be working on the yaml COFFParser
>> struct and I'm having quite a bit of a headache turning the COFFObjectFile
>> into a COFFParser object or compatible... Tomorrow I might try the very non
>> efficient path of coff2yaml and then yaml2coff with the hashes header...
>> but it seems way too inefficient and convoluted.
>>
>> On Fri, Jan 19, 2018 at 10:38 PM, Zachary Turner <zturner at google.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, Jan 19, 2018 at 1:02 PM Leonardo Santagada <santagada at gmail.com>
>>> wrote:
>>>
>>>> On Fri, Jan 19, 2018 at 9:44 PM, Zachary Turner <zturner at google.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, Jan 19, 2018 at 12:29 PM Leonardo Santagada <
>>>>> santagada at gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> No I didn't, I used cl.exe from the visual studio toolchain. What I'm
>>>>>> proposing is a tool for processing .obj files in COFF format, reading them
>>>>>> and generating the GHASH part.
>>>>>>
>>>>>> To make our build faster we use hundreds of unity build files (.cpp's
>>>>>> with a lot of other .cpp's in them aka munch files) but still have a lot of
>>>>>> single .cpp's as well (in total something like 3.4k .obj files).
>>>>>>
>>>>>> ps: sorry for sending to the wrong list, I was reading about llvm
>>>>>> mailing lists and jumped when I saw what I thought was a lld exclusive list.
>>>>>>
>>>>>
>>>>> A tool like this would be useful, yes.  We've talked about it
>>>>> internally as well and agreed it would be useful, we just haven't
>>>>> prioritized it.  If you're interested in submitting a patch along those
>>>>> lines though, I think it would be a good addition.
>>>>>
>>>>> I'm not sure what the best place for it would be.  llvm-readobj and
>>>>> llvm-objdump seem like obvious choices, but they are intended to be
>>>>> read-only, so perhaps they wouldn't be a good fit.
>>>>>
>>>>> llvm-pdbutil is kind of a hodgepodge of everything else related to
>>>>> PDBs and symbols, so I wouldn't be opposed to making a new subcommand there
>>>>> called "ghash" or something that could process an object file and output a
>>>>> new object file with a .debug$H section.
>>>>>
>>>>> A third option would be to make a new tool for it.
>>>>>
>>>>> I don't htink it would be that hard to write.  If you're interested in
>>>>> trying to make a patch for this, I can offer some guidance on where to look
>>>>> in the code.  Otherwise it's something that we'll probably get to, I'm just
>>>>> not sure when.
>>>>>
>>>>>>
>>>> I would love to write it and contribute it back, please do tell, I did
>>>> find some of the code of ghash in lld, but in fuzzy on the llvm codeview
>>>> part of it and never seen llvm-readobj/objdump or llvm-pdbutil, but I'm not
>>>> afraid to look :)
>>>>
>>>>
>>>  Luckily all of the important code is hidden behind library calls, and
>>> it should already just do the right thing, so I suspect you won't need to
>>> know much about CodeView to do this.
>>>
>>> I think Peter has the right idea about putting this in llvm-objcopy.
>>>
>>> You can look at one of the existing CopyBinary functions there, which
>>> currently only work for ELF, but you can just make a new overload that
>>> accepts a COFFObjectFile.
>>>
>>> I would probably start by iterating over each of the sections
>>> (getNumberOfSections / getSectionName) looking for .debug$T and .debug$H
>>> sections.
>>>
>>> If you find a .debug$H section then you can just skip that object file.
>>>
>>> If you find a .debug$T but not a .debug$H, then basically do the same
>>> thing that LLD does in PDBLinker::mergeDebugT  (create a CVTypeArray, and
>>> pass it to GloballyHashedType::hashTypes.  That will return an array of
>>> hash values.  (the format of .debug$H is the header, followed by the hash
>>> values).  Then when you're writing the list of sections, just add in the
>>> .debug$H section right after the .debug$T section.
>>>
>>> Currently llvm-objcopy only writes ELF files, so it would need to be
>>> taught to write COFF files.  We have code to do this in the yaml2obj
>>> utility (specifically, in yaml2coff.cpp in the function writeCOFF).  There
>>> may be a way to move this code to somewhere else (llvm/Object/COFF.h?) so
>>> that it can be re-used by both yaml2coff and llvm-objcopy, but in the worst
>>> case scenario you could copy the code and re-write it to work with these
>>> new structures.
>>>
>>> Lastly, you'll probably want to put all of this behind an option in
>>> llvm-objcopy such as -add-codeview-ghash-section
>>>
>>>
>>
>>
>> --
>>
>> Leonardo Santagada
>>
>


-- 

Leonardo Santagada
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180120/de95168f/attachment.html>


More information about the llvm-dev mailing list