[llvm-dev] Making LLD PDB generation faster

Reid Kleckner via llvm-dev llvm-dev at lists.llvm.org
Wed Feb 27 11:10:23 PST 2019


This could be ICF. There were lots of issues with ICF on ARM64, but they
are not inherently ARM64-specific, they just come up there more often. See
https://reviews.llvm.org/D56986 which fixes that.

Easiest thing is always to profile or add /time to see what's slow.

On Wed, Feb 27, 2019 at 6:30 AM Leonardo Santagada <santagada at gmail.com>
wrote:

> Anyone would know why lld takes > 30 minutes to link lld without
> symbols on release?
>
> The command line seems simple enough:
>
> C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo @CMakeFiles\lld.rsp
> /out:bin\lld.exe /implib:lib\lld.lib /version:0.0 /machine:x64
> -fuse-ld=lld /STACK:10000000 /INCREMENTAL:NO /subsystem:console
> /MANIFEST /MANIFESTFILE:bin\lld.exe.manifest
>
> On Mon, Feb 25, 2019 at 8:20 PM Leonardo Santagada <santagada at gmail.com>
> wrote:
> >
> > Sadly the patch on https://reviews.llvm.org/D55585 didn't apply on my
> > clone of llvm at all :( It will take me quite some time to test this
> > out.
> >
> > On Mon, Feb 25, 2019 at 5:08 PM Alexandre Ganea
> > <alexandre.ganea at ubisoft.com> wrote:
> > >
> > > For enabling large memory pages, see this link:
> https://support.sisoftware.co.uk/knowledgebase.php?article=52
> > >
> > > Meow hash isn't in the patch I posted, but you can use xxHash, it is
> good enough. Just add /hasher:xxhash to the LLD cmd-line.
> > >
> > >
> > > -----Original Message-----
> > > From: Leonardo Santagada <santagada at gmail.com>
> > > Sent: Monday, February 25, 2019 11:05 AM
> > > To: Alexandre Ganea <alexandre.ganea at ubisoft.com>
> > > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner <rnk at google.com>;
> llvm-dev <llvm-dev at lists.llvm.org>
> > > Subject: Re: [llvm-dev] Making LLD PDB generation faster
> > >
> > > Times for lld compiled with LTO:
> > >
> > >   Input File Reading:          1430 ms (  3.3%)
> > >   Code Layout:                  486 ms (  1.1%)
> > >   PDB Emission (Cumulative):  41042 ms ( 94.6%)
> > >     Add Objects:              33117 ms ( 76.4%)
> > >       Type Merging:           25861 ms ( 59.6%)
> > >       Symbol Merging:          7011 ms ( 16.2%)
> > >     TPI Stream Layout:          996 ms (  2.3%)
> > >     Globals Stream Layout:      513 ms (  1.2%)
> > >     Commit to Disk:            5175 ms ( 11.9%)
> > >   Commit Output File:            37 ms (  0.1%)
> > > -------------------------------------------------
> > > Total Link Time:              43366 ms (100.0%)
> > >
> > > LTO didn't help much :(
> > >
> > > Now I will try Alexandre patches and switch fo xxHash64 or meow
> hashing. I need to discover how to enable huge pages on my windows
> > > (1809)
> > >
> > > ps: Need to figure out how to limit the number of link jobs in ninja
> as that almost used the whole 128GB of ram on my machine. On our
> distributed build system we can limit linking jobs (which are the only
> strict local jobs) to 8.
> > >
> > > On Mon, Feb 25, 2019 at 4:47 PM Alexandre Ganea <
> alexandre.ganea at ubisoft.com> wrote:
> > > >
> > > > …however it is very slow to compile, because /MP isn’t currently
> supported by clang-cl. So each CPP is compiled sequentially, one after
> another. Thus my patch for adding /MP.
> > > >
> > > >
> > > >
> > > > From: Alexandre Ganea
> > > > Sent: Monday, February 25, 2019 10:42 AM
> > > > To: Zachary Turner <zturner at google.com>; Leonardo Santagada
> > > > <santagada at gmail.com>
> > > > Cc: Reid Kleckner <rnk at google.com>; llvm-dev <
> llvm-dev at lists.llvm.org>
> > > > Subject: RE: [llvm-dev] Making LLD PDB generation faster
> > > >
> > > >
> > > >
> > > > Yes, -Tllvm works.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > From: Zachary Turner <zturner at google.com>
> > > > Sent: Monday, February 25, 2019 10:36 AM
> > > > To: Leonardo Santagada <santagada at gmail.com>
> > > > Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com>; Reid Kleckner
> > > > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> > > > Subject: Re: [llvm-dev] Making LLD PDB generation faster
> > > >
> > > >
> > > >
> > > > Is -Tllvm even supported? I thought the only thing you could pass for
> > > > -T was -Thost=x64
> > > >
> > > > On Mon, Feb 25, 2019 at 6:52 AM Leonardo Santagada <
> santagada at gmail.com> wrote:
> > > >
> > > > I think its a huge bug that it doesn't raise any errors or warnings
> > > > about it. But I will open a ticket on cmake, they should be using
> > > > clang-cl.exe and lld-link.exe if T="llvm" probably set host to 64 bit
> > > > as well.
> > > >
> > > > On Mon, Feb 25, 2019 at 3:34 PM Zachary Turner <zturner at google.com>
> wrote:
> > > > >
> > > > > I don’t think changing the compiler or linker is supported with the
> > > > > vs generator, but I also don’t think it’s a bug On Mon, Feb 25,
> 2019 at 6:31 AM Alexandre Ganea <alexandre.ganea at ubisoft.com> wrote:
> > > > >>
> > > > >> Can you please try using Ninja instead?
> > > > >>
> > > > >> cmake -G Ninja f:/svn/llvm -DCMAKE_BUILD_TYPE=Release
> > > > >> -DLLVM_OPTIMIZED_TABLEGEN=true
> > > > >> -DLLVM_EXTERNAL_LLD_SOURCE_DIR=f:/svn/lld
> > > > >> -DLLVM_TOOL_LLD_BUILD=true -DLLVM_ENABLE_LLD=true
> > > > >> -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe"
> > > > >> -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe"
> > > > >> -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe"
> > > > >> -DLLVM_ENABLE_PDB=true
> > > > >>
> > > > >> It will be faster to compile. The setup I use is the above Ninja
> cmd-line for compiling optimized builds; and in addition, I keep the Visual
> Studio generator, as you do, but only for having a .sln to debug. It is a
> bit annoying to cmake twice, in two different build folders, but you can
> write a batch script.
> > > > >>
> > > > >> If the above works, maybe you should log the bug on
> https://bugs.llvm.org/ so it is not forgotten.
> > > > >>
> > > > >> Alex.
> > > > >>
> > > > >> -----Original Message-----
> > > > >> From: Leonardo Santagada <santagada at gmail.com>
> > > > >> Sent: Monday, February 25, 2019 9:04 AM
> > > > >> To: Alexandre Ganea <alexandre.ganea at ubisoft.com>
> > > > >> Cc: Zachary Turner <zturner at google.com>; Reid Kleckner
> > > > >> <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> > > > >> Subject: Re: [llvm-dev] Making LLD PDB generation faster
> > > > >>
> > > > >> Ok so there's a lot of confusion on cmake regarding using llvm as
> a toolset. It still does all its checks against cl.exe (not clang-cl) and
> somehow overriders CMAKE_LINKER to be link.exe. I tried a couple of places
> including:
> > > > >>
> > > > >> cmake -G "Visual Studio 15 2017" -A x64 -T"llvm",host=x64
> -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe"
> > > > >> -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe"
> > > > >> -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe"
> > > > >> -DLLVM_ENABLE_LTO=true -DLLVM_ENABLE_PDB=true
> > > > >> -DLLVM_ENABLE_PROJECTS=lld  ../llvm
> > > > >>
> > > > >> but it seems like the generator overrides it.
> > > > >>
> > > > >>
> > > > >> ps: Created a phabricator account
> > > > >>
> > > > >> On Mon, Feb 25, 2019 at 2:48 PM Alexandre Ganea <
> alexandre.ganea at ubisoft.com> wrote:
> > > > >> >
> > > > >> > That's good news. For having debug info, you could try adding
> /Z7 on the cmake cmd-line, such as -DCMAKE_CXX_FLAGS="/Z7". Or use the
> 'RelWithDebInfo' target instead of 'Release' and add
> -DCMAKE_CXX_FLAGS="/Ob2" (because that target uses /Ob1 as a default).
> > > > >> >
> > > > >> > Can you please send a patch on Phabricator if you fix the
> LLVM_ENABLE_PDB issue with Clang? The goal is to have performance
> out-of-the-box.
> > > > >> >
> > > > >> > Alex.
> > > > >> >
> > > > >> > -----Original Message-----
> > > > >> > From: Leonardo Santagada <santagada at gmail.com>
> > > > >> > Sent: Monday, February 25, 2019 7:36 AM
> > > > >> > To: Alexandre Ganea <alexandre.ganea at ubisoft.com>
> > > > >> > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner
> > > > >> > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> > > > >> > Subject: Re: [llvm-dev] Making LLD PDB generation faster
> > > > >> >
> > > > >> > With your patch for cmake and reconfiguring it with "cmake -G
> "Visual Studio 15 2017" -A x64 -T"llvm",host=x64 -DLLVM_ENABLE_PDB=true
> -DLLVM_ENABLE_PROJECTS=lld  ../llvm" we get these results:
> > > > >> >
> > > > >> >   Input File Reading:          1602 ms (  3.5%)
> > > > >> >   Code Layout:                  493 ms (  1.1%)
> > > > >> >   PDB Emission (Cumulative):  43127 ms ( 94.5%)
> > > > >> >     Add Objects:              34577 ms ( 75.8%)
> > > > >> >       Type Merging:           26709 ms ( 58.5%)
> > > > >> >       Symbol Merging:          7598 ms ( 16.7%)
> > > > >> >     TPI Stream Layout:         1107 ms (  2.4%)
> > > > >> >     Globals Stream Layout:      602 ms (  1.3%)
> > > > >> >     Commit to Disk:            5636 ms ( 12.4%)
> > > > >> >   Commit Output File:            16 ms (  0.0%)
> > > > >> > -------------------------------------------------
> > > > >> > Total Link Time:              45626 ms (100.0%)
> > > > >> >
> > > > >> > Unfortunately there were no pdb generated with lld.exe (or any
> > > > >> > other
> > > > >> > binaries) so I can't debug them. It seems like LLVM_ENABLE_PDB
> is not made to support using clang to complie itself as it tries to att /Zi
> to the targets instead of /Z7 and global hashes. I can patch it over here,
> but we probably want to fix this in cmake and on the docs, as its not clear
> at all how to compile lld in a performance 64bit way.
> > > > >> >
> > > > >> > On Mon, Feb 25, 2019 at 2:38 AM Alexandre Ganea <
> alexandre.ganea at ubisoft.com> wrote:
> > > > >> > >
> > > > >> > > How do you compile LLD? There's a big difference between when
> > > > >> > > using MSVC vs Clang. The parallel ghash patch I was mentioning
> > > > >> > > is almost 2x as fast when using Clang 7.0+ vs. MSVC 15.9+, I
> > > > >> > > don't know exactly why. I also suggest you use the Release
> target. You should also grab this patch:
> > > > >> > > https://reviews.llvm.org/D55056 - I had to revert it because
> it
> > > > >> > > was causing issues with LLDB. But it will give an improvement
> for LLD.
> > > > >> > > Please let me know if that improves your timings.
> > > > >> > >
> > > > >> > > The page faults are probably the OS loading from disk: most,
> if
> > > > >> > > not all the files are accessed by LLD by mmap'ing them.
> > > > >> > >
> > > > >> > > The lockless DenseHash I was talking about will be published
> in
> > > > >> > > an upcoming patch. As for reproducibility, this can be an
> issue
> > > > >> > > on build systems. But on local machines, we could explicitly
> > > > >> > > state that we want non-deterministic builds, through some
> cmd-line flag. If your 57sec for "Type Merging"
> > > > >> > > transforms into 5sec when non-deterministic, I think that's
> worth it.
> > > > >> > >
> > > > >> > > Alex.
> > > > >> > >
> > > > >> > > -----Original Message-----
> > > > >> > > From: Leonardo Santagada <santagada at gmail.com>
> > > > >> > > Sent: Sunday, February 24, 2019 6:43 PM
> > > > >> > > To: Alexandre Ganea <alexandre.ganea at ubisoft.com>
> > > > >> > > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner
> > > > >> > > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> > > > >> > > Subject: Re: [llvm-dev] Making LLD PDB generation faster
> > > > >> > >
> > > > >> > > More info inline, I think there is a couple of misconceptions
> on what I'm doing:
> > > > >> > >
> > > > >> > > 1) I already patch all my .obj files to contain .debug$H
> > > > >> > > entries so it is all ghashed already
> > > > >> > > 2) All the 35s is spent adding to the DenseMap
> > > > >> > >
> > > > >> > > Here is my current times (lld-link.exe compiled with -O2 so
> no lto/pgo), lld generates a 141 MB binary and 1.2GB pdb file:
> > > > >> > >
> > > > >> > >   Input File Reading:          1724 ms (  2.1%)
> > > > >> > >   Code Layout:                  482 ms (  0.6%)
> > > > >> > >   PDB Emission (Cumulative):  79261 ms ( 96.8%)
> > > > >> > >     Add Objects:              68650 ms ( 83.8%)
> > > > >> > >       Type Merging:           57534 ms ( 70.2%)
> > > > >> > >       Symbol Merging:         10822 ms ( 13.2%)
> > > > >> > >     TPI Stream Layout:         1501 ms (  1.8%)
> > > > >> > >     Globals Stream Layout:      770 ms (  0.9%)
> > > > >> > >     Commit to Disk:            7007 ms (  8.6%)
> > > > >> > >   Commit Output File:            19 ms (  0.0%)
> > > > >> > > -------------------------------------------------
> > > > >> > > Total Link Time:              81900 ms (100.0%)
> > > > >> > >
> > > > >> > > Our target is for < 20 seconds linking, anything bellow 40
> seconds would be ok. Ideal times would be around 8s (in which it will
> mostly beat link.exe incremental linking).
> > > > >> > >
> > > > >> > > My tip for profiling is using superluminal
> > > > >> > > (https://www.superluminal.eu/) its the easiest way to see
> everything your code is doing.
> > > > >> > >
> > > > >> > > On Sun, Feb 24, 2019 at 5:18 PM Alexandre Ganea <
> alexandre.ganea at ubisoft.com> wrote:
> > > > >> > > >
> > > > >> > > > Leonardo, to answer to your questions, yes to all of them J
> > > > >> > > > You can take a
> > > > >> > > >
> > > > >> > > > look at this prototype/proposal:
> > > > >> > > > https://reviews.llvm.org/D55585
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Overall, computing ghashes in parallel at link-time and
> > > > >> > > > merging Types with them
> > > > >> > > >
> > > > >> > > > is less costly that the current approach to merging. The
> > > > >> > > > 35sec you’re seeing
> > > > >> > > >
> > > > >> > > > for merging should go down to about 15sec.
> > > > >> > >
> > > > >> > > I don't do much computing of ghashes as we already preprocess
> all .obj files from msvc to add a .debug$H to them. The whole 35 seconds is
> spent in just densehash findbucket function. The rest of the time is mostly
> pagefaults (I guess to load in obj data and to grow the final pdb?).
> > > > >> > >
> > > > >> > > > The patch doesn’t parallelize
> > > > >> > > >
> > > > >> > > > (yet) the Type merging itself, but we have an alternate
> > > > >> > > > multithread-suitable
> > > > >> > > >
> > > > >> > > > implementation of DenseHash which already supports lockless,
> > > > >> > > > wait-free,
> > > > >> > > >
> > > > >> > > > insert/fetch/resize.
> > > > >> > >
> > > > >> > > Where is this lockless densehash? This is the part were I
> would love to help, but if there is a densehash it is probably just
> creating the threads and letting them merge the results. I'm a bit afraid
> of reproduceability of builds, but as we already don't have that with
> link.exe we are not really loosing anything.
> > > > >> > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > The prototype allows for testing different hashing
> > > > >> > > > algorithms, and indeed
> > > > >> > > >
> > > > >> > > > xxHash seems to be the best general-purpose choice. I’ve
> also
> > > > >> > > > added support
> > > > >> > > >
> > > > >> > > > for more specialized hardware-based hashes, like Casey
> > > > >> > > > Muratori’s Meow Hash
> > > > >> > > >
> > > > >> > > > (uses hardware AES SSE 4.2 instructions), which brings the
> figures down a bit.
> > > > >> > > >
> > > > >> > >
> > > > >> > > I remembered Meow hashes needing at least k bytes of data,
> but looking at their website right now there is no mention of it. Hashing
> performance isn't much of an impact as we do it per .obj file distributed
> through our company so the time to calculate those are completely
> distributed.
> > > > >> > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Future changes could write back the computed ghash stream
> > > > >> > > > back to OBJs if
> > > > >> > > >
> > > > >> > > > /INCREMENTAL is specified (just an idea). Incrementally
> > > > >> > > > linking will be faster
> > > > >> > > >
> > > > >> > > > that way when working with MSVC OBJs.
> > > > >> > > >
> > > > >> > >
> > > > >> > > I already have a patch for llvm-objcopy that adds a
> > > > >> > > -add-ghashes option that does this, I will be cleaning it up
> > > > >> > > this week and sending a PR for it
> > > > >> > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > As for creating PDBs for independent projects, that would
> help most likely.
> > > > >> > > >
> > > > >> > > > However the ghash stream would need to be stored in the PDB
> > > > >> > > > in that case
> > > > >> > > >
> > > > >> > > > (currently, ghashes are dropped after merging). That could
> > > > >> > > > help when using
> > > > >> > > >
> > > > >> > > > rarely compiled projects, used along with network caches.
> > > > >> > >
> > > > >> > > I meant actually a .lib, with all the obj files inside plus a
> merged .debug$H entry. No pdb generation or changes necessary, we just run
> the same code that merges types in lld and do that a the librarian level.
> > > > >> > >
> > > > >> > > >
> > > > >> > > > I will start sending smaller patches to converge towards the
> > > > >> > > > functionally of
> > > > >> > > >
> > > > >> > > > the prototype above.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Best,
> > > > >> > > >
> > > > >> > > > Alex.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > From: Zachary Turner <zturner at google.com>
> > > > >> > > > Sent: Sunday, February 24, 2019 1:20 AM
> > > > >> > > > To: Leonardo Santagada <santagada at gmail.com>
> > > > >> > > > Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com>; Reid
> > > > >> > > > Kleckner <rnk at google.com>; llvm-dev <
> llvm-dev at lists.llvm.org>
> > > > >> > > > Subject: Re: [llvm-dev] Making LLD PDB generation faster
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > +Reid and Alexandre, who have been doing work in this area
> > > > >> > > > +recently
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Sat, Feb 23, 2019 at 4:07 AM Leonardo Santagada via
> llvm-dev <llvm-dev at lists.llvm.org> wrote:
> > > > >> > > >
> > > > >> > > > Hi,
> > > > >> > > >
> > > > >> > > > Is anyone working on making the PDB generation on LLD
> faster?
> > > > >> > > > Looking of a trace for linking one of our binaries (it takes
> > > > >> > > > 1min6s-1min20s) I see two things:
> > > > >> > > >
> > > > >> > > > 1) LookupBucketFor(Val, ConstFoundBucket); takes 35s so
> > > > >> > > > almost half of the time of linking, mostly finding
> duplicates
> > > > >> > > > 2) There is no parallelization inside of addObjectsToPDB
> > > > >> > > >
> > > > >> > > > Is anyone working on those? Also has anyone thought about
> > > > >> > > > merging .obj files to deduplicate type infomation so we can
> > > > >> > > > do the linking on projects to generate something like a lib
> > > > >> > > > file, but deduplicated debug information (as far as I know
> > > > >> > > > actual .lib just put all pdbs or
> > > > >> > > > /Z7 debug info inside a file without dedup).
> > > > >> > > >
> > > > >> > > > Just looking at the code it seems it is much more mature and
> > > > >> > > > also the choice of SHA1_8 seems interesting (still don't
> know
> > > > >> > > > why not use xxHash64).
> > > > >> > > >
> > > > >> > > > ps: My code to add ghashes to msvc compiled .obj files is
> > > > >> > > > almost ready to be pushed as an option for llvm-objcopy.
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > >
> > > > >> > > > Leonardo Santagada
> > > > >> > > > _______________________________________________
> > > > >> > > > LLVM Developers mailing list
> > > > >> > > > llvm-dev at lists.llvm.org
> > > > >> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > >
> > > > >> > > Leonardo Santagada
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> >
> > > > >> > Leonardo Santagada
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >>
> > > > >> Leonardo Santagada
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Leonardo Santagada
> > >
> > >
> > >
> > > --
> > >
> > > Leonardo Santagada
> >
> >
> >
> > --
> >
> > Leonardo Santagada
>
>
>
> --
>
> Leonardo Santagada
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190227/f21a031e/attachment.html>


More information about the llvm-dev mailing list