[llvm-dev] [lldb-dev] Trying out lld to link windows binaries (using msvc as a compiler)

Leonardo Santagada via llvm-dev llvm-dev at lists.llvm.org
Thu Jan 25 09:49:05 PST 2018


I did reorder my sections, so that .debug$H is in the correct place, but
now I get some errors on dubplicate symbols, I created a folder with
examples:

https://www.dropbox.com/sh/nmvzi44pi0boe76/AAA0f47O5PCJ9JiUc6wVuwBra?dl=0

t.obj is generated by vs 2015 and it links fine with lld-link.exe, but
tout.obj gives this errors:

lld-link.exe /DEBUG:GHASH tout.obj
LLD-LINK.EXE: error: duplicate symbol: __local_stdio_printf_options in
tout.obj and in LIBCMT.lib(default_local_stdio_options.obj)
LLD-LINK.EXE: error: duplicate symbol: __local_stdio_printf_options in
tout.obj and in libvcruntime.lib(undname.obj)

I'm using PEView from http://wjradburn.com/software/ to look at the files
and can't see anything wrong, except some valid differences in the offsets
being used for the data (so pointer to data is different between them).

I will look into yaml2obj now to see if I see anything else weird going on.


On Thu, Jan 25, 2018 at 6:41 PM, Zachary Turner <zturner at google.com> wrote:

> I'm pretty confident that cl is not putting anything strange in the
> .debug$T sections.  We've done a lot of testing and never seen anything
> except CodeView type records in a .debug$T.  My hunch is that your objcopy
> patch is probably not doing the right thing in one or more of the section
> headers, and this is confusing the linker.
>
> One idea might be to build a simple object file with clang-cl but without
> the magic -mllvm -emit-codeview-ghash-section, then run your llvm-objcopy
> on it.  Then build the same object file passing -mllvm
> -emit-codeview-ghash-section.  Then run obj2yaml on both and diff the
> results.  They should be byte-for-byte identical.  That should give you a
> clue about if objcopy is doing something wrong.
>
> On Thu, Jan 25, 2018 at 2:21 AM Leonardo Santagada <santagada at gmail.com>
> wrote:
>
>> Don't worry, I definetly want to perfect this to generate legal obj
>> files, this is just to speed up testing.
>>
>> Now after patching all the obj files I get this errors when linking a
>> small part of our code base (msvc 2017 15.5.3, lld and llvm-objcopy 7.0.0):
>> lld-link.exe : error : relocation against symbol in discarded section:
>> $LN8
>> lld-link.exe : error : relocation against symbol in discarded section:
>> $LN43
>> lld-link.exe : error : relocation against symbol in discarded section:
>> $LN37
>>
>> I'm starting to guess that cl.exe might be putting some random comdat or
>> other discardable symbols in the .debug$T and clang doesn't? I will try to
>> debug this and see what more I can uncover.
>>
>> Linking works perfectly without my llvm-objcopy pass to add .debug$H?
>>
>>
>> On Thu, Jan 25, 2018 at 1:53 AM, Zachary Turner <zturner at google.com>
>> wrote:
>>
>>> It might not influence LLD, but at the same time we don't want to
>>> upstream something that is producing technically illegal COFF files.  Also
>>> good to hear about the planned changes to your header files.  Looking
>>> forward to hearing about your experiences with clang-cl.
>>>
>>> On Wed, Jan 24, 2018 at 10:41 AM Leonardo Santagada <santagada at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I finally got my first .obj file patched with .debug$H to look somewhat
>>>> right. I added the new section at the end of the file so I don't have to
>>>> recalculate all sections (although now I probably could position it in the
>>>> middle, knowing that each section is: SizeOfRawData + (last.Header.NumberOfRelocations
>>>> * (4+4+2)) and the $H needs to come right after $T in the file). That
>>>> although illegal based on the coff specs doesn't seem its going to
>>>> influence lld.
>>>>
>>>> Also we talked and we are probably going to do something similar to a
>>>> bunch of windows defines and a check for our own define (to guarantee that
>>>> no one imported windows.h before win32.h) and drop the namespace and the
>>>> conflicting names.
>>>>
>>>>
>>>> On Tue, Jan 23, 2018 at 12:46 AM, Zachary Turner <zturner at google.com>
>>>> wrote:
>>>>
>>>>> That's very possible that a 3rd party indirect header include is
>>>>> involved.  One idea might be like I suggested where you #define _WINDOWS_
>>>>> in win32.h and guarantee that it's always included first.  Then those other
>>>>> headers won't be able to #include <windows.h>.  but it will probably
>>>>> greatly expand the amount of stuff you have to add to win32.h, as you will
>>>>> probably find some callers of functions that aren't yet in your win32.h
>>>>> that you'd have to add.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jan 22, 2018 at 3:28 PM Leonardo Santagada <
>>>>> santagada at gmail.com> wrote:
>>>>>
>>>>>> Ok some information was lost on getting this example to you, I'm
>>>>>> sorry for not being clear.
>>>>>>
>>>>>> We have a huge code base, let's say 90% of it doesn't include either
>>>>>> header, 9% include win32.h and 1% includes both, I will try to discover
>>>>>> why, but my guess is they include both a third party that includes
>>>>>> windows.h and some of our libs that use win32.h.
>>>>>>
>>>>>> I will try to fully understand this tomorrow.
>>>>>>
>>>>>> I guess clang will not implement this ever so finishing the object
>>>>>> copier is the best solution until all code is ported to clang.
>>>>>>
>>>>>> On 23 Jan 2018 00:02, "Zachary Turner" <zturner at google.com> wrote:
>>>>>>
>>>>>>> You said win32.h doesn't include windows.h, but main.cpp does.  So
>>>>>>> what's the disadvantage of just including it in win32.h anyway, since it's
>>>>>>> already going to be in every translation unit?  (Unless you didn't mean to
>>>>>>> #include it in main.cpp)
>>>>>>>
>>>>>>>
>>>>>>> I guess all I can do is warn you how bad of an idea this is.  For
>>>>>>> starters, I already found a bug in your code ;-)
>>>>>>>
>>>>>>> // stdint.h
>>>>>>> typedef int                int32_t;
>>>>>>>
>>>>>>> // winnt.h
>>>>>>> typedef long LONG;
>>>>>>>
>>>>>>> // windef.h
>>>>>>> typedef struct tagPOINT
>>>>>>> {
>>>>>>>     LONG  x;   // long x
>>>>>>>     LONG  y;   // long y
>>>>>>> } POINT, *PPOINT, NEAR *NPPOINT, FAR *LPPOINT;
>>>>>>>
>>>>>>> // win32.h
>>>>>>> typedef int32_t LONG;
>>>>>>>
>>>>>>> struct POINT
>>>>>>> {
>>>>>>> LONG x;   // int x
>>>>>>> LONG y;   // int y
>>>>>>> };
>>>>>>>
>>>>>>> So POINT is defined two different ways.  In your minimal interface,
>>>>>>> it's declared as 2 int32's, which are int.  In the actual Windows header
>>>>>>> files, it's declared as 2 longs.
>>>>>>>
>>>>>>> This might seem like a unimportant bug since int and long are the
>>>>>>> same size, but int and long also mangle differently and affect overload
>>>>>>> resolution, so you could have weird linker errors or call the wrong
>>>>>>> function overload.
>>>>>>>
>>>>>>> Plus, it illustrates the fact that this struct *actually is* a
>>>>>>> different type from the one in the windows header.
>>>>>>>
>>>>>>> You said at the end that you never intentionally import win32.h and
>>>>>>> windows.h from the same translation unit.  But then in this example you
>>>>>>> did.  I wonder if you could enforce that by doing this:
>>>>>>>
>>>>>>> // win32.h
>>>>>>> #pragma once
>>>>>>>
>>>>>>> // Error if windows.h was included before us.
>>>>>>> #if defined(_WINDOWS_)
>>>>>>> #error "You're including win32.h after having already included
>>>>>>> windows.h.  Don't do this!"
>>>>>>> #endif
>>>>>>>
>>>>>>> // And also make sure windows.h can't get included after us
>>>>>>> #define _WINDOWS_
>>>>>>>
>>>>>>> For the record, I tried the test case you linked when windows.h is
>>>>>>> not included in main.cpp and it works (but still has the bug about int and
>>>>>>> long).
>>>>>>>
>>>>>>> On Mon, Jan 22, 2018 at 2:23 PM Leonardo Santagada <
>>>>>>> santagada at gmail.com> wrote:
>>>>>>>
>>>>>>>> It is super gross, but we copy parts of windows.h because having
>>>>>>>> all of it if both gigantic and very very messy. So our win32.h has a couple
>>>>>>>> thousands of lines and not 30k+ for windows.h and we try to have zero
>>>>>>>> macros. Win32.h doesn't include windows.h so using ::BOOL wouldn't work. We
>>>>>>>> don't want to create a namespace, we just want a cleaner interface to
>>>>>>>> windows api. The namespace with c linkage is the way to trick cl into
>>>>>>>> allowing us to in some files have both windows.h and Win32.h. I really
>>>>>>>> don't see any way for us to have this Win32.h without this cl support, so
>>>>>>>> maybe we should either put windows.h in a compiled header somewhere and not
>>>>>>>> care that it is infecting everything or just have one place we can call to
>>>>>>>> clean up after including windows.h (a massive set of undefs).
>>>>>>>>
>>>>>>>> So using can't work, because we never intentionally import
>>>>>>>> windows.h and win32.h on the same translation unit.
>>>>>>>>
>>>>>>>> On Mon, Jan 22, 2018 at 7:08 PM, Zachary Turner <zturner at google.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> This is pretty gross, honestly :)
>>>>>>>>>
>>>>>>>>> Can't you just use using declarations?
>>>>>>>>>
>>>>>>>>> namespace Win32 {
>>>>>>>>> extern "C" {
>>>>>>>>>
>>>>>>>>> using ::BOOL;
>>>>>>>>> using ::LONG;
>>>>>>>>> using ::POINT;
>>>>>>>>> using ::LPPOINT;
>>>>>>>>>
>>>>>>>>> using ::GetCursorPos;
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> This works with clang-cl.
>>>>>>>>>
>>>>>>>>> On Mon, Jan 22, 2018 at 5:39 AM Leonardo Santagada <
>>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Here it is a minimal example, we do this so we don't have to
>>>>>>>>>> import the whole windows api everywhere.
>>>>>>>>>>
>>>>>>>>>> https://gist.github.com/santagada/7977e929d31c629c4bf18ebb987f6b
>>>>>>>>>> e3
>>>>>>>>>>
>>>>>>>>>> On Sun, Jan 21, 2018 at 2:31 AM, Zachary Turner <
>>>>>>>>>> zturner at google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Clang-cl maintains compatibility with msvc even in cases where
>>>>>>>>>>> it’s non standards compliant (eg 2 phase name lookup), but we try to keep
>>>>>>>>>>> these cases few and far between.
>>>>>>>>>>>
>>>>>>>>>>> To help me understand your case, do you mean you copy windows.h
>>>>>>>>>>> and modify it? How does this lead to the same struct being defined twice?
>>>>>>>>>>> If i were to write this:
>>>>>>>>>>>
>>>>>>>>>>> struct Foo {};
>>>>>>>>>>> struct Foo {};
>>>>>>>>>>>
>>>>>>>>>>> Is this a small repro of the issue you’re talking about?
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Jan 20, 2018 at 3:44 PM Leonardo Santagada <
>>>>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I can totally see something like incremental linking with a
>>>>>>>>>>>> simple padding between obj and a mapping file (which can also help with
>>>>>>>>>>>> edit and continue, something we also would love to have).
>>>>>>>>>>>>
>>>>>>>>>>>> We have another developer doing the port to support clang-cl,
>>>>>>>>>>>> but although most of our code also goes trough a version of clang,
>>>>>>>>>>>> migrating the rest to clang-cl has been a fight. From what I heard the main
>>>>>>>>>>>> problem is that we have a copy of parts of windows.h (so not to bring the
>>>>>>>>>>>> awful parts of it like lower case macros) and that totally works on cl, but
>>>>>>>>>>>> clang (at least 6.0) complains about two struct/vars with the same name,
>>>>>>>>>>>> even though they are exactly the same. Making clang-cl as broken as cl.exe
>>>>>>>>>>>> is not an option I suppose? I would love to turn on a flag
>>>>>>>>>>>> --accept-that-cl-made-bad-decisions-and-live-with-it and have
>>>>>>>>>>>> this at least until this is completely fixed in our code base.
>>>>>>>>>>>>
>>>>>>>>>>>> the biggest win with moving to cl would be a better more
>>>>>>>>>>>> standards compliant compiler, no 1 minute compiles on heavily templated
>>>>>>>>>>>> files and maybe the holy grail of ThinLTO.
>>>>>>>>>>>>
>>>>>>>>>>>> On Sat, Jan 20, 2018 at 10:56 PM, Zachary Turner <
>>>>>>>>>>>> zturner at google.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> 10-15s will be hard without true incremental linking.
>>>>>>>>>>>>>
>>>>>>>>>>>>> At some point that's going to be the only way to get any
>>>>>>>>>>>>> faster, but incremental linking is hard (putting it lightly), and since our
>>>>>>>>>>>>> full links are already really fast we think we can get reasonably close to
>>>>>>>>>>>>> link.exe incremental speeds with full links.  But it's never enough and I
>>>>>>>>>>>>> will always want it to be faster, so you may see incremental linking in the
>>>>>>>>>>>>> future after we hit a performance wall with full link speed :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> In any case, I'm definitely interested in seeing what kind of
>>>>>>>>>>>>> numbers you get with /debug:ghash after you get this llvm-objcopy feature
>>>>>>>>>>>>> implemented.  So keep me updated :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> As an aside, have you tried building with clang instead of
>>>>>>>>>>>>> cl?  If you build with clang you wouldn't even have to do this llvm-objcopy
>>>>>>>>>>>>> work, because it would "just work".  If you've tried but ran into issues
>>>>>>>>>>>>> I'm interested in hearing about those too.  On the other hand, it's also
>>>>>>>>>>>>> reasonable to only switch one thing at a time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Jan 20, 2018 at 1:34 PM Leonardo Santagada <
>>>>>>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> if we get to < 30s I think most users would prefer it to
>>>>>>>>>>>>>> link.exe, just hopping there is still some more optimizations to get closer
>>>>>>>>>>>>>> to ELF linking times (around 10-15s here).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Jan 20, 2018 at 9:50 PM, Zachary Turner <
>>>>>>>>>>>>>> zturner at google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Generally speaking a good rule of thumb is that /debug:ghash
>>>>>>>>>>>>>>> will be close to or faster than /debug:fastlink, but with none of the
>>>>>>>>>>>>>>> penalties like slow debug time
>>>>>>>>>>>>>>> On Sat, Jan 20, 2018 at 12:44 PM Zachary Turner <
>>>>>>>>>>>>>>> zturner at google.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Chrome is actually one of my exact benchmark cases. When
>>>>>>>>>>>>>>>> building blink_core.dll and browser_tests.exe, i get anywhere from a 20-40%
>>>>>>>>>>>>>>>> reduction in link time. We have some other optimizations in the pipeline
>>>>>>>>>>>>>>>> but not upstream yet.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My best time so far (including other optimizations not yet
>>>>>>>>>>>>>>>> upstream) is 28s on blink_core.dll, compared to 110s with /debug
>>>>>>>>>>>>>>>> On Sat, Jan 20, 2018 at 12:28 PM Leonardo Santagada <
>>>>>>>>>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Jan 20, 2018 at 9:05 PM, Zachary Turner <
>>>>>>>>>>>>>>>>> zturner at google.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You probably don't want to go down the same route that
>>>>>>>>>>>>>>>>>> clang goes through to write the object file.  If you think yaml2coff is
>>>>>>>>>>>>>>>>>> convoluted, the way clang does it will just give you a headache.  There are
>>>>>>>>>>>>>>>>>> multiple abstractions involved to account for different object file formats
>>>>>>>>>>>>>>>>>> (ELF, COFF, MachO) and output formats (Assembly, binary file).  At least
>>>>>>>>>>>>>>>>>> with yaml2coff
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think your phrase got cut there, but yeah I just found
>>>>>>>>>>>>>>>>> AsmPrinter.cpp and it is convoluted.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It's true that yaml2coff is using the COFFParser
>>>>>>>>>>>>>>>>>> structure, but if you look at the writeCOFF function in
>>>>>>>>>>>>>>>>>> yaml2coff it's pretty bare-metal.  The logic you need will be almost
>>>>>>>>>>>>>>>>>> identical, except that instead of checking the COFFParser for the various
>>>>>>>>>>>>>>>>>> fields, you'll check the existing COFFObjectFile, which should have similar
>>>>>>>>>>>>>>>>>> fields.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The only thing you need to different is when writing the
>>>>>>>>>>>>>>>>>> section table and section contents, to insert a new entry.  Since
>>>>>>>>>>>>>>>>>> you're injecting a section into the middle, you'll also probably need to
>>>>>>>>>>>>>>>>>> push back the file pointer of all subsequent sections so that they don't
>>>>>>>>>>>>>>>>>> overlap.  (e.g. if the original sections are 1, 2, 3, 4, 5 and you insert
>>>>>>>>>>>>>>>>>> between 2 and 3, then the original sections 3, 4, and 5 would need to have
>>>>>>>>>>>>>>>>>> their FilePointerToRawData offset by the size of the new section).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have the PE/COFF spec open here and I'm happy that I
>>>>>>>>>>>>>>>>> read a bit of it so I actually know what you are talking about... yeah it
>>>>>>>>>>>>>>>>> doesn't seem too complicated.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If you need to know what values to put for the other
>>>>>>>>>>>>>>>>>> fields in a section header, run `dumpbin /headers foo.obj` on a
>>>>>>>>>>>>>>>>>> clang-generated object file that has a .debug$H section already (e.g. run
>>>>>>>>>>>>>>>>>> clang with -emit-codeview-ghash-section, and look at the properties of the
>>>>>>>>>>>>>>>>>> .debug$H section and use the same values).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks I will do that and then also look at how the
>>>>>>>>>>>>>>>>> CodeView part of the code does it if I can't understand some of it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The only invariant that needs to be maintained is that
>>>>>>>>>>>>>>>>>> Section[N]->FilePointerOfRawData == Section[N-1]->FilePointerOfRawData +
>>>>>>>>>>>>>>>>>> Section[N-1]->SizeOfRawData
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Well, that and all the sections need to be on the final
>>>>>>>>>>>>>>>>> file... But I'm hopeful.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Anyone has times on linking a big project like chrome with
>>>>>>>>>>>>>>>>> this so that at least I know what kind of performance to expect?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My numbers are something like:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1 pdb per obj file: link.exe takes ~15 minutes and 16GB of
>>>>>>>>>>>>>>>>> ram, lld-link.exe takes 2:30 minutes and ~8GB of ram
>>>>>>>>>>>>>>>>> around 10 pdbs per folder: link.exe takes 1 minute and
>>>>>>>>>>>>>>>>> 2-3GB of ram, lld-link.exe takes 1:30 minutes and ~6GB of ram
>>>>>>>>>>>>>>>>> faslink: link.exe takes 40 seconds, but then 20 seconds of
>>>>>>>>>>>>>>>>> loading at the first break point in the debugger and we lost DIA support
>>>>>>>>>>>>>>>>> for listing symbols.
>>>>>>>>>>>>>>>>> incremental: link.exe takes 8 seconds, but it only happens
>>>>>>>>>>>>>>>>> when very minor changes happen.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We have an non negligible number of symbols used on some
>>>>>>>>>>>>>>>>> runtime systems.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sat, Jan 20, 2018 at 11:52 AM Leonardo Santagada <
>>>>>>>>>>>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for the tips, I now have something that reads the
>>>>>>>>>>>>>>>>>>> obj file, finds .debug$T sections and global hashes it (proof of concept
>>>>>>>>>>>>>>>>>>> kind of code). What I can't find is: how does clang itself writes the coff
>>>>>>>>>>>>>>>>>>> files with global hashes, as that might help me understand how to create
>>>>>>>>>>>>>>>>>>> the .debug$H section, how to update the file section count and how to
>>>>>>>>>>>>>>>>>>> properly write this back.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The code on yaml2coff is expecting to be working on the
>>>>>>>>>>>>>>>>>>> yaml COFFParser struct and I'm having quite a bit of a headache turning the
>>>>>>>>>>>>>>>>>>> COFFObjectFile into a COFFParser object or compatible... Tomorrow I might
>>>>>>>>>>>>>>>>>>> try the very non efficient path of coff2yaml and then yaml2coff with the
>>>>>>>>>>>>>>>>>>> hashes header... but it seems way too inefficient and convoluted.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Jan 19, 2018 at 10:38 PM, Zachary Turner <
>>>>>>>>>>>>>>>>>>> zturner at google.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, Jan 19, 2018 at 1:02 PM Leonardo Santagada <
>>>>>>>>>>>>>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 19, 2018 at 9:44 PM, Zachary Turner <
>>>>>>>>>>>>>>>>>>>>> zturner at google.com> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 19, 2018 at 12:29 PM Leonardo Santagada <
>>>>>>>>>>>>>>>>>>>>>> santagada at gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> No I didn't, I used cl.exe from the visual studio
>>>>>>>>>>>>>>>>>>>>>>> toolchain. What I'm proposing is a tool for processing .obj files in COFF
>>>>>>>>>>>>>>>>>>>>>>> format, reading them and generating the GHASH part.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> To make our build faster we use hundreds of unity
>>>>>>>>>>>>>>>>>>>>>>> build files (.cpp's with a lot of other .cpp's in them aka munch files) but
>>>>>>>>>>>>>>>>>>>>>>> still have a lot of single .cpp's as well (in total something like 3.4k
>>>>>>>>>>>>>>>>>>>>>>> .obj files).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> ps: sorry for sending to the wrong list, I was
>>>>>>>>>>>>>>>>>>>>>>> reading about llvm mailing lists and jumped when I saw what I thought was a
>>>>>>>>>>>>>>>>>>>>>>> lld exclusive list.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> A tool like this would be useful, yes.  We've talked
>>>>>>>>>>>>>>>>>>>>>> about it internally as well and agreed it would be useful, we just haven't
>>>>>>>>>>>>>>>>>>>>>> prioritized it.  If you're interested in submitting a patch along those
>>>>>>>>>>>>>>>>>>>>>> lines though, I think it would be a good addition.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I'm not sure what the best place for it would be.
>>>>>>>>>>>>>>>>>>>>>> llvm-readobj and llvm-objdump seem like obvious choices, but they are
>>>>>>>>>>>>>>>>>>>>>> intended to be read-only, so perhaps they wouldn't be a good fit.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> llvm-pdbutil is kind of a hodgepodge of everything
>>>>>>>>>>>>>>>>>>>>>> else related to PDBs and symbols, so I wouldn't be opposed to making a new
>>>>>>>>>>>>>>>>>>>>>> subcommand there called "ghash" or something that could process an object
>>>>>>>>>>>>>>>>>>>>>> file and output a new object file with a .debug$H section.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> A third option would be to make a new tool for it.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I don't htink it would be that hard to write.  If
>>>>>>>>>>>>>>>>>>>>>> you're interested in trying to make a patch for this, I can offer some
>>>>>>>>>>>>>>>>>>>>>> guidance on where to look in the code.  Otherwise it's something that we'll
>>>>>>>>>>>>>>>>>>>>>> probably get to, I'm just not sure when.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I would love to write it and contribute it back,
>>>>>>>>>>>>>>>>>>>>> please do tell, I did find some of the code of ghash in lld, but in fuzzy
>>>>>>>>>>>>>>>>>>>>> on the llvm codeview part of it and never seen llvm-readobj/objdump or
>>>>>>>>>>>>>>>>>>>>> llvm-pdbutil, but I'm not afraid to look :)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>  Luckily all of the important code is hidden behind
>>>>>>>>>>>>>>>>>>>> library calls, and it should already just do the right thing, so I suspect
>>>>>>>>>>>>>>>>>>>> you won't need to know much about CodeView to do this.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I think Peter has the right idea about putting this in
>>>>>>>>>>>>>>>>>>>> llvm-objcopy.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> You can look at one of the existing CopyBinary
>>>>>>>>>>>>>>>>>>>> functions there, which currently only work for ELF, but you can just make a
>>>>>>>>>>>>>>>>>>>> new overload that accepts a COFFObjectFile.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I would probably start by iterating over each of the
>>>>>>>>>>>>>>>>>>>> sections (getNumberOfSections / getSectionName) looking for .debug$T and
>>>>>>>>>>>>>>>>>>>> .debug$H sections.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If you find a .debug$H section then you can just skip
>>>>>>>>>>>>>>>>>>>> that object file.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If you find a .debug$T but not a .debug$H, then
>>>>>>>>>>>>>>>>>>>> basically do the same thing that LLD does in PDBLinker::mergeDebugT
>>>>>>>>>>>>>>>>>>>> (create a CVTypeArray, and pass it to GloballyHashedType::hashTypes.
>>>>>>>>>>>>>>>>>>>> That will return an array of hash values.  (the format of .debug$H is the
>>>>>>>>>>>>>>>>>>>> header, followed by the hash values).  Then when you're writing the list of
>>>>>>>>>>>>>>>>>>>> sections, just add in the .debug$H section right after the .debug$T section.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Currently llvm-objcopy only writes ELF files, so it
>>>>>>>>>>>>>>>>>>>> would need to be taught to write COFF files.  We have code to do this in
>>>>>>>>>>>>>>>>>>>> the yaml2obj utility (specifically, in yaml2coff.cpp in the function
>>>>>>>>>>>>>>>>>>>> writeCOFF).  There may be a way to move this code to somewhere else
>>>>>>>>>>>>>>>>>>>> (llvm/Object/COFF.h?) so that it can be re-used by both yaml2coff and
>>>>>>>>>>>>>>>>>>>> llvm-objcopy, but in the worst case scenario you could copy the code and
>>>>>>>>>>>>>>>>>>>> re-write it to work with these new structures.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Lastly, you'll probably want to put all of this behind
>>>>>>>>>>>>>>>>>>>> an option in llvm-objcopy such as -add-codeview-ghash-section
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Leonardo Santagada
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Leonardo Santagada
>>>>>>>>
>>>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Leonardo Santagada
>>>>
>>>
>>
>>
>> --
>>
>> Leonardo Santagada
>>
>


-- 

Leonardo Santagada
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180125/7cd63036/attachment-0001.html>


More information about the llvm-dev mailing list