[llvm-dev] Making LLD PDB generation faster

Thu Feb 28 02:17:15 PST 2019

Yeah your binary if probably very similar to ours, I work at Guerrilla
games, so yeah game engine/editor. I tried extracting just your
implementation of densemap from that changelist and it seemed to not give
any speedups in our case. I think I will clean up my changes to
llvm-objcopy and maybe wait to test your changes.

ps: Do you guys hang around on any chat room? I think chatting a bit can be
helpful.

On Thu, Feb 28, 2019 at 1:14 AM Alexandre Ganea <alexandre.ganea at ubisoft.com>
wrote:

> As for multithreaded ghashes:
>
>
>
> Even if the hashtable stores 32-bit indices to SeenHashes, you would still
> need to compare the ghashes for collisions:
>
>
> https://github.com/llvm/llvm-project/blob/master/llvm/include/llvm/ADT/DenseMap.h#L627
>
> Finding the 32-bit index in the hashtable doesn’t necessarily mean it’s
> the right one.
>
>
>
> The following table shows the collision distribution when inserting (type)
> ghashes into the DenseMap. This is for a farily large EXE, comparable to
> yours I suppose.
>
> This shows that 65% of buckets are hit (inserted or found) on the first
> bucket accessed. But there’s still 35% which requires querying more buckets
> in the hashtable, up to 54 buckets.
>
>
>
> 1
>
> 134,994,464
>
> 65.551%
>
> 2
>
> 35,478,867
>
> 17.228%
>
> 3
>
> 15,658,999
>
> 7.604%
>
> 4
>
> 8,045,798
>
> 3.907%
>
> 5
>
> 4,540,451
>
> 2.205%
>
> 6
>
> 2,634,179
>
> 1.279%
>
> 7
>
> 1,608,599
>
> 0.781%
>
> 8
>
> 1,007,705
>
> 0.489%
>
> 9
>
> 643,471
>
> 0.312%
>
> 10
>
> 418,645
>
> 0.203%
>
> 11
>
> 279816
>
> 0.136%
>
> 12
>
> 189733
>
> 0.092%
>
> 13
>
> 129686
>
> 0.063%
>
> 14
>
> 90484
>
> 0.044%
>
> 15
>
> 62863
>
> 0.031%
>
> 16
>
> 43584
>
> 0.021%
>
> 17
>
> 31240
>
> 0.015%
>
> 18
>
> 22180
>
> 0.011%
>
> 19
>
> 15850
>
> 0.008%
>
> 20
>
> 11266
>
> 0.005%
>
> 21
>
> 8171
>
> 0.004%
>
> 22
>
> 5900
>
> 0.003%
>
> 23
>
> 4379
>
> 0.002%
>
> 24
>
> 3167
>
> 0.002%
>
> 25
>
> 2316
>
> 0.001%
>
> 26
>
> 1681
>
> 0.001%
>
> 27
>
> 1185
>
> 0.001%
>
> 28
>
> 901
>
> 0.000%
>
> 29
>
> 638
>
> 0.000%
>
> 30
>
> 465
>
> 0.000%
>
> 31
>
> 367
>
> 0.000%
>
> 32
>
> 280
>
> 0.000%
>
> 33
>
> 189
>
> 0.000%
>
> 34
>
> 140
>
> 0.000%
>
> 35
>
> 106
>
> 0.000%
>
> 36
>
> 76
>
> 0.000%
>
> 37
>
> 55
>
> 0.000%
>
> 38
>
> 37
>
> 0.000%
>
> 39
>
> 27
>
> 0.000%
>
> 40
>
> 20
>
> 0.000%
>
> 41
>
> 18
>
> 0.000%
>
> 42
>
> 15
>
> 0.000%
>
> 43
>
> 13
>
> 0.000%
>
> 44
>
> 10
>
> 0.000%
>
> 45
>
> 6
>
> 0.000%
>
> 46
>
> 6
>
> 0.000%
>
> 47
>
> 5
>
> 0.000%
>
> 48
>
> 4
>
> 0.000%
>
> 49
>
> 3
>
> 0.000%
>
> 50
>
> 3
>
> 0.000%
>
> 51
>
> 2
>
> 0.000%
>
> 52
>
> 2
>
> 0.000%
>
> 53
>
> 2
>
> 0.000%
>
> 54
>
> 2
>
> 0.000%
>
> 205,938,071
>
>
>
> And here is the cache miss distribution, with 8-byte buckets (value and
> hash), as implemented in
> https://reviews.llvm.org/D55585#change-nIfiq2fvl33C
>
> So about 86% of hits (insertions or fetches) will be found on the first
> cacheline accessed, for any given Type record inserted.
>
>
>
> 1
>
> 177774132
>
> 86.324%
>
> 2
>
> 20133777
>
> 9.777%
>
> 3
>
> 4046867
>
> 1.965%
>
> 4
>
> 1506067
>
> 0.731%
>
> 5
>
> 829202
>
> 0.403%
>
> 6
>
> 533119
>
> 0.259%
>
> 7
>
> 348794
>
> 0.169%
>
> 8
>
> 233626
>
> 0.113%
>
> 9
>
> 159738
>
> 0.078%
>
> 10
>
> 110651
>
> 0.054%
>
> 11
>
> 76223
>
> 0.037%
>
> 12
>
> 53601
>
> 0.026%
>
> 13
>
> 37271
>
> 0.018%
>
> 14
>
> 26666
>
> 0.013%
>
> 15
>
> 19013
>
> 0.009%
>
> 16
>
> 13591
>
> 0.007%
>
> 17
>
> 9667
>
> 0.005%
>
> 18
>
> 7024
>
> 0.003%
>
> 19
>
> 5102
>
> 0.002%
>
> 20
>
> 3780
>
> 0.002%
>
> 21
>
> 2767
>
> 0.001%
>
> 22
>
> 2003
>
> 0.001%
>
> 23
>
> 1423
>
> 0.001%
>
> 24
>
> 1041
>
> 0.001%
>
> 25
>
> 780
>
> 0.000%
>
> 26
>
> 565
>
> 0.000%
>
> 27
>
> 411
>
> 0.000%
>
> 28
>
> 315
>
> 0.000%
>
> 29
>
> 228
>
> 0.000%
>
> 30
>
> 164
>
> 0.000%
>
> 31
>
> 117
>
> 0.000%
>
> 32
>
> 91
>
> 0.000%
>
> 33
>
> 61
>
> 0.000%
>
> 34
>
> 42
>
> 0.000%
>
> 35
>
> 30
>
> 0.000%
>
> 36
>
> 21
>
> 0.000%
>
> 37
>
> 18
>
> 0.000%
>
> 38
>
> 16
>
> 0.000%
>
> 39
>
> 13
>
> 0.000%
>
> 40
>
> 13
>
> 0.000%
>
> 41
>
> 9
>
> 0.000%
>
> 42
>
> 6
>
> 0.000%
>
> 43
>
> 5
>
> 0.000%
>
> 44
>
> 5
>
> 0.000%
>
> 45
>
> 3
>
> 0.000%
>
> 46
>
> 3
>
> 0.000%
>
> 47
>
> 3
>
> 0.000%
>
> 48
>
> 2
>
> 0.000%
>
> 49
>
> 2
>
> 0.000%
>
> 50
>
> 2
>
> 0.000%
>
> 51
>
> 1
>
> 0.000%
>
> 205938071
>
>
>
> Now if you split the key (64-bit ghash) from the value (32-bit TypeIndex),
> you would double the cache misses, right?
>
> The hashtable can quickly get pretty big, it is very unlikely that you
> would hit the same cache line in the L1d on the next record (and maybe L2).
>
>
>
> Could you please explain your algorithm more in detail?
>
>
>
> llvm-lib is llvm-ar in disguise ;-)
>
>
>
>
>
> *From:* Leonardo Santagada <santagada at gmail.com>
> *Sent:* Wednesday, February 27, 2019 6:17 PM
> *To:* Reid Kleckner <rnk at google.com>
> *Cc:* Alexandre Ganea <alexandre.ganea at ubisoft.com>; Zachary Turner <
> zturner at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] Making LLD PDB generation faster
>
>
>
> My problem was that some library was still built with lto and I think that
> forces lld to do lto, but contrary to msvc it doesn't do any warning about
> it.
>
>
>
> I think we are going to try sharding the ghashes to multiple threads and
> have a hashmap that only contains the index to a list of seen types. This
> way we hope there's no need for any locking.
>
>
>
> Also we are investigating why we have 420 million types being linked while
> it appears that 95-99 % of them are not being used. De  anyone know if pch
> can help here? My feeling is not much as template instantiation still
> generates a ton of weak symbols on the pch users, but I might be confused.
>
>
>
> Also another idea is to create lib files with types and hashes merged,
> where is llvm-lib source at? It seems funny but I don't seem to find it
> anywhere.
>
>
>
> Ps: the cmake bug for using llvm in visual studio projects has been fixed
> upstream.
>
>
>
> On Wed, 27 Feb 2019 at 20:10, Reid Kleckner <rnk at google.com> wrote:
>
> This could be ICF. There were lots of issues with ICF on ARM64, but they
> are not inherently ARM64-specific, they just come up there more often. See
> https://reviews.llvm.org/D56986 which fixes that.
>
>
>
> Easiest thing is always to profile or add /time to see what's slow.
>
>
>
> On Wed, Feb 27, 2019 at 6:30 AM Leonardo Santagada <santagada at gmail.com>
> wrote:
>
> Anyone would know why lld takes > 30 minutes to link lld without
> symbols on release?
>
> The command line seems simple enough:
>
> C:\PROGRA~1\LLVM\bin\lld-link.exe /nologo @CMakeFiles\lld.rsp
> /out:bin\lld.exe /implib:lib\lld.lib /version:0.0 /machine:x64
> -fuse-ld=lld /STACK:10000000 /INCREMENTAL:NO /subsystem:console
> /MANIFEST /MANIFESTFILE:bin\lld.exe.manifest
>
> On Mon, Feb 25, 2019 at 8:20 PM Leonardo Santagada <santagada at gmail.com>
> wrote:
> >
> > Sadly the patch on https://reviews.llvm.org/D55585 didn't apply on my
> > clone of llvm at all :( It will take me quite some time to test this
> > out.
> >
> > On Mon, Feb 25, 2019 at 5:08 PM Alexandre Ganea
> > <alexandre.ganea at ubisoft.com> wrote:
> > >
> > > For enabling large memory pages, see this link:
> https://support.sisoftware.co.uk/knowledgebase.php?article=52
> > >
> > > Meow hash isn't in the patch I posted, but you can use xxHash, it is
> good enough. Just add /hasher:xxhash to the LLD cmd-line.
> > >
> > >
> > > -----Original Message-----
> > > From: Leonardo Santagada <santagada at gmail.com>
> > > Sent: Monday, February 25, 2019 11:05 AM
> > > To: Alexandre Ganea <alexandre.ganea at ubisoft.com>
> > > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner <rnk at google.com>;
> llvm-dev <llvm-dev at lists.llvm.org>
> > > Subject: Re: [llvm-dev] Making LLD PDB generation faster
> > >
> > > Times for lld compiled with LTO:
> > >
> > >   Input File Reading:          1430 ms (  3.3%)
> > >   Code Layout:                  486 ms (  1.1%)
> > >   PDB Emission (Cumulative):  41042 ms ( 94.6%)
> > >     Add Objects:              33117 ms ( 76.4%)
> > >       Type Merging:           25861 ms ( 59.6%)
> > >       Symbol Merging:          7011 ms ( 16.2%)
> > >     TPI Stream Layout:          996 ms (  2.3%)
> > >     Globals Stream Layout:      513 ms (  1.2%)
> > >     Commit to Disk:            5175 ms ( 11.9%)
> > >   Commit Output File:            37 ms (  0.1%)
> > > -------------------------------------------------
> > > Total Link Time:              43366 ms (100.0%)
> > >
> > > LTO didn't help much :(
> > >
> > > Now I will try Alexandre patches and switch fo xxHash64 or meow
> hashing. I need to discover how to enable huge pages on my windows
> > > (1809)
> > >
> > > ps: Need to figure out how to limit the number of link jobs in ninja
> as that almost used the whole 128GB of ram on my machine. On our
> distributed build system we can limit linking jobs (which are the only
> strict local jobs) to 8.
> > >
> > > On Mon, Feb 25, 2019 at 4:47 PM Alexandre Ganea <
> alexandre.ganea at ubisoft.com> wrote:
> > > >
> > > > …however it is very slow to compile, because /MP isn’t currently
> supported by clang-cl. So each CPP is compiled sequentially, one after
> another. Thus my patch for adding /MP.
> > > >
> > > >
> > > >
> > > > From: Alexandre Ganea
> > > > Sent: Monday, February 25, 2019 10:42 AM
> > > > To: Zachary Turner <zturner at google.com>; Leonardo Santagada
> > > > <santagada at gmail.com>
> > > > Cc: Reid Kleckner <rnk at google.com>; llvm-dev <
> llvm-dev at lists.llvm.org>
> > > > Subject: RE: [llvm-dev] Making LLD PDB generation faster
> > > >
> > > >
> > > >
> > > > Yes, -Tllvm works.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > From: Zachary Turner <zturner at google.com>
> > > > Sent: Monday, February 25, 2019 10:36 AM
> > > > To: Leonardo Santagada <santagada at gmail.com>
> > > > Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com>; Reid Kleckner
> > > > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> > > > Subject: Re: [llvm-dev] Making LLD PDB generation faster
> > > >
> > > >
> > > >
> > > > Is -Tllvm even supported? I thought the only thing you could pass for
> > > > -T was -Thost=x64
> > > >
> > > > On Mon, Feb 25, 2019 at 6:52 AM Leonardo Santagada <
> santagada at gmail.com> wrote:
> > > >
> > > > I think its a huge bug that it doesn't raise any errors or warnings
> > > > about it. But I will open a ticket on cmake, they should be using
> > > > clang-cl.exe and lld-link.exe if T="llvm" probably set host to 64 bit
> > > > as well.
> > > >
> > > > On Mon, Feb 25, 2019 at 3:34 PM Zachary Turner <zturner at google.com>
> wrote:
> > > > >
> > > > > I don’t think changing the compiler or linker is supported with the
> > > > > vs generator, but I also don’t think it’s a bug On Mon, Feb 25,
> 2019 at 6:31 AM Alexandre Ganea <alexandre.ganea at ubisoft.com> wrote:
> > > > >>
> > > > >> Can you please try using Ninja instead?
> > > > >>
> > > > >> cmake -G Ninja f:/svn/llvm -DCMAKE_BUILD_TYPE=Release
> > > > >> -DLLVM_OPTIMIZED_TABLEGEN=true
> > > > >> -DLLVM_EXTERNAL_LLD_SOURCE_DIR=f:/svn/lld
> > > > >> -DLLVM_TOOL_LLD_BUILD=true -DLLVM_ENABLE_LLD=true
> > > > >> -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe"
> > > > >> -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe"
> > > > >> -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe"
> > > > >> -DLLVM_ENABLE_PDB=true
> > > > >>
> > > > >> It will be faster to compile. The setup I use is the above Ninja
> cmd-line for compiling optimized builds; and in addition, I keep the Visual
> Studio generator, as you do, but only for having a .sln to debug. It is a
> bit annoying to cmake twice, in two different build folders, but you can
> write a batch script.
> > > > >>
> > > > >> If the above works, maybe you should log the bug on
> https://bugs.llvm.org/ so it is not forgotten.
> > > > >>
> > > > >> Alex.
> > > > >>
> > > > >> -----Original Message-----
> > > > >> From: Leonardo Santagada <santagada at gmail.com>
> > > > >> Sent: Monday, February 25, 2019 9:04 AM
> > > > >> To: Alexandre Ganea <alexandre.ganea at ubisoft.com>
> > > > >> Cc: Zachary Turner <zturner at google.com>; Reid Kleckner
> > > > >> <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> > > > >> Subject: Re: [llvm-dev] Making LLD PDB generation faster
> > > > >>
> > > > >> Ok so there's a lot of confusion on cmake regarding using llvm as
> a toolset. It still does all its checks against cl.exe (not clang-cl) and
> somehow overriders CMAKE_LINKER to be link.exe. I tried a couple of places
> including:
> > > > >>
> > > > >> cmake -G "Visual Studio 15 2017" -A x64 -T"llvm",host=x64
> -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe"
> > > > >> -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe"
> > > > >> -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe"
> > > > >> -DLLVM_ENABLE_LTO=true -DLLVM_ENABLE_PDB=true
> > > > >> -DLLVM_ENABLE_PROJECTS=lld  ../llvm
> > > > >>
> > > > >> but it seems like the generator overrides it.
> > > > >>
> > > > >>
> > > > >> ps: Created a phabricator account
> > > > >>
> > > > >> On Mon, Feb 25, 2019 at 2:48 PM Alexandre Ganea <
> alexandre.ganea at ubisoft.com> wrote:
> > > > >> >
> > > > >> > That's good news. For having debug info, you could try adding
> /Z7 on the cmake cmd-line, such as -DCMAKE_CXX_FLAGS="/Z7". Or use the
> 'RelWithDebInfo' target instead of 'Release' and add
> -DCMAKE_CXX_FLAGS="/Ob2" (because that target uses /Ob1 as a default).
> > > > >> >
> > > > >> > Can you please send a patch on Phabricator if you fix the
> LLVM_ENABLE_PDB issue with Clang? The goal is to have performance
> out-of-the-box.
> > > > >> >
> > > > >> > Alex.
> > > > >> >
> > > > >> > -----Original Message-----
> > > > >> > From: Leonardo Santagada <santagada at gmail.com>
> > > > >> > Sent: Monday, February 25, 2019 7:36 AM
> > > > >> > To: Alexandre Ganea <alexandre.ganea at ubisoft.com>
> > > > >> > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner
> > > > >> > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> > > > >> > Subject: Re: [llvm-dev] Making LLD PDB generation faster
> > > > >> >
> > > > >> > With your patch for cmake and reconfiguring it with "cmake -G
> "Visual Studio 15 2017" -A x64 -T"llvm",host=x64 -DLLVM_ENABLE_PDB=true
> -DLLVM_ENABLE_PROJECTS=lld  ../llvm" we get these results:
> > > > >> >
> > > > >> >   Input File Reading:          1602 ms (  3.5%)
> > > > >> >   Code Layout:                  493 ms (  1.1%)
> > > > >> >   PDB Emission (Cumulative):  43127 ms ( 94.5%)
> > > > >> >     Add Objects:              34577 ms ( 75.8%)
> > > > >> >       Type Merging:           26709 ms ( 58.5%)
> > > > >> >       Symbol Merging:          7598 ms ( 16.7%)
> > > > >> >     TPI Stream Layout:         1107 ms (  2.4%)
> > > > >> >     Globals Stream Layout:      602 ms (  1.3%)
> > > > >> >     Commit to Disk:            5636 ms ( 12.4%)
> > > > >> >   Commit Output File:            16 ms (  0.0%)
> > > > >> > -------------------------------------------------
> > > > >> > Total Link Time:              45626 ms (100.0%)
> > > > >> >
> > > > >> > Unfortunately there were no pdb generated with lld.exe (or any
> > > > >> > other
> > > > >> > binaries) so I can't debug them. It seems like LLVM_ENABLE_PDB
> is not made to support using clang to complie itself as it tries to att /Zi
> to the targets instead of /Z7 and global hashes. I can patch it over here,
> but we probably want to fix this in cmake and on the docs, as its not clear
> at all how to compile lld in a performance 64bit way.
> > > > >> >
> > > > >> > On Mon, Feb 25, 2019 at 2:38 AM Alexandre Ganea <
> alexandre.ganea at ubisoft.com> wrote:
> > > > >> > >
> > > > >> > > How do you compile LLD? There's a big difference between when
> > > > >> > > using MSVC vs Clang. The parallel ghash patch I was mentioning
> > > > >> > > is almost 2x as fast when using Clang 7.0+ vs. MSVC 15.9+, I
> > > > >> > > don't know exactly why. I also suggest you use the Release
> target. You should also grab this patch:
> > > > >> > > https://reviews.llvm.org/D55056 - I had to revert it because
> it
> > > > >> > > was causing issues with LLDB. But it will give an improvement
> for LLD.
> > > > >> > > Please let me know if that improves your timings.
> > > > >> > >
> > > > >> > > The page faults are probably the OS loading from disk: most,
> if
> > > > >> > > not all the files are accessed by LLD by mmap'ing them.
> > > > >> > >
> > > > >> > > The lockless DenseHash I was talking about will be published
> in
> > > > >> > > an upcoming patch. As for reproducibility, this can be an
> issue
> > > > >> > > on build systems. But on local machines, we could explicitly
> > > > >> > > state that we want non-deterministic builds, through some
> cmd-line flag. If your 57sec for "Type Merging"
> > > > >> > > transforms into 5sec when non-deterministic, I think that's
> worth it.
> > > > >> > >
> > > > >> > > Alex.
> > > > >> > >
> > > > >> > > -----Original Message-----
> > > > >> > > From: Leonardo Santagada <santagada at gmail.com>
> > > > >> > > Sent: Sunday, February 24, 2019 6:43 PM
> > > > >> > > To: Alexandre Ganea <alexandre.ganea at ubisoft.com>
> > > > >> > > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner
> > > > >> > > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> > > > >> > > Subject: Re: [llvm-dev] Making LLD PDB generation faster
> > > > >> > >
> > > > >> > > More info inline, I think there is a couple of misconceptions
> on what I'm doing:
> > > > >> > >
> > > > >> > > 1) I already patch all my .obj files to contain .debug$H
> > > > >> > > entries so it is all ghashed already
> > > > >> > > 2) All the 35s is spent adding to the DenseMap
> > > > >> > >
> > > > >> > > Here is my current times (lld-link.exe compiled with -O2 so
> no lto/pgo), lld generates a 141 MB binary and 1.2GB pdb file:
> > > > >> > >
> > > > >> > >   Input File Reading:          1724 ms (  2.1%)
> > > > >> > >   Code Layout:                  482 ms (  0.6%)
> > > > >> > >   PDB Emission (Cumulative):  79261 ms ( 96.8%)
> > > > >> > >     Add Objects:              68650 ms ( 83.8%)
> > > > >> > >       Type Merging:           57534 ms ( 70.2%)
> > > > >> > >       Symbol Merging:         10822 ms ( 13.2%)
> > > > >> > >     TPI Stream Layout:         1501 ms (  1.8%)
> > > > >> > >     Globals Stream Layout:      770 ms (  0.9%)
> > > > >> > >     Commit to Disk:            7007 ms (  8.6%)
> > > > >> > >   Commit Output File:            19 ms (  0.0%)
> > > > >> > > -------------------------------------------------
> > > > >> > > Total Link Time:              81900 ms (100.0%)
> > > > >> > >
> > > > >> > > Our target is for < 20 seconds linking, anything bellow 40
> seconds would be ok. Ideal times would be around 8s (in which it will
> mostly beat link.exe incremental linking).
> > > > >> > >
> > > > >> > > My tip for profiling is using superluminal
> > > > >> > > (https://www.superluminal.eu/) its the easiest way to see
> everything your code is doing.
> > > > >> > >
> > > > >> > > On Sun, Feb 24, 2019 at 5:18 PM Alexandre Ganea <
> alexandre.ganea at ubisoft.com> wrote:
> > > > >> > > >
> > > > >> > > > Leonardo, to answer to your questions, yes to all of them J
> > > > >> > > > You can take a
> > > > >> > > >
> > > > >> > > > look at this prototype/proposal:
> > > > >> > > > https://reviews.llvm.org/D55585
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Overall, computing ghashes in parallel at link-time and
> > > > >> > > > merging Types with them
> > > > >> > > >
> > > > >> > > > is less costly that the current approach to merging. The
> > > > >> > > > 35sec you’re seeing
> > > > >> > > >
> > > > >> > > > for merging should go down to about 15sec.
> > > > >> > >
> > > > >> > > I don't do much computing of ghashes as we already preprocess
> all .obj files from msvc to add a .debug$H to them. The whole 35 seconds is
> spent in just densehash findbucket function. The rest of the time is mostly
> pagefaults (I guess to load in obj data and to grow the final pdb?).
> > > > >> > >
> > > > >> > > > The patch doesn’t parallelize
> > > > >> > > >
> > > > >> > > > (yet) the Type merging itself, but we have an alternate
> > > > >> > > > multithread-suitable
> > > > >> > > >
> > > > >> > > > implementation of DenseHash which already supports lockless,
> > > > >> > > > wait-free,
> > > > >> > > >
> > > > >> > > > insert/fetch/resize.
> > > > >> > >
> > > > >> > > Where is this lockless densehash? This is the part were I
> would love to help, but if there is a densehash it is probably just
> creating the threads and letting them merge the results. I'm a bit afraid
> of reproduceability of builds, but as we already don't have that with
> link.exe we are not really loosing anything.
> > > > >> > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > The prototype allows for testing different hashing
> > > > >> > > > algorithms, and indeed
> > > > >> > > >
> > > > >> > > > xxHash seems to be the best general-purpose choice. I’ve
> also
> > > > >> > > > added support
> > > > >> > > >
> > > > >> > > > for more specialized hardware-based hashes, like Casey
> > > > >> > > > Muratori’s Meow Hash
> > > > >> > > >
> > > > >> > > > (uses hardware AES SSE 4.2 instructions), which brings the
> figures down a bit.
> > > > >> > > >
> > > > >> > >
> > > > >> > > I remembered Meow hashes needing at least k bytes of data,
> but looking at their website right now there is no mention of it. Hashing
> performance isn't much of an impact as we do it per .obj file distributed
> through our company so the time to calculate those are completely
> distributed.
> > > > >> > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Future changes could write back the computed ghash stream
> > > > >> > > > back to OBJs if
> > > > >> > > >
> > > > >> > > > /INCREMENTAL is specified (just an idea). Incrementally
> > > > >> > > > linking will be faster
> > > > >> > > >
> > > > >> > > > that way when working with MSVC OBJs.
> > > > >> > > >
> > > > >> > >
> > > > >> > > I already have a patch for llvm-objcopy that adds a
> > > > >> > > -add-ghashes option that does this, I will be cleaning it up
> > > > >> > > this week and sending a PR for it
> > > > >> > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > As for creating PDBs for independent projects, that would
> help most likely.
> > > > >> > > >
> > > > >> > > > However the ghash stream would need to be stored in the PDB
> > > > >> > > > in that case
> > > > >> > > >
> > > > >> > > > (currently, ghashes are dropped after merging). That could
> > > > >> > > > help when using
> > > > >> > > >
> > > > >> > > > rarely compiled projects, used along with network caches.
> > > > >> > >
> > > > >> > > I meant actually a .lib, with all the obj files inside plus a
> merged .debug$H entry. No pdb generation or changes necessary, we just run
> the same code that merges types in lld and do that a the librarian level.
> > > > >> > >
> > > > >> > > >
> > > > >> > > > I will start sending smaller patches to converge towards the
> > > > >> > > > functionally of
> > > > >> > > >
> > > > >> > > > the prototype above.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > Best,
> > > > >> > > >
> > > > >> > > > Alex.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > From: Zachary Turner <zturner at google.com>
> > > > >> > > > Sent: Sunday, February 24, 2019 1:20 AM
> > > > >> > > > To: Leonardo Santagada <santagada at gmail.com>
> > > > >> > > > Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com>; Reid
> > > > >> > > > Kleckner <rnk at google.com>; llvm-dev <
> llvm-dev at lists.llvm.org>
> > > > >> > > > Subject: Re: [llvm-dev] Making LLD PDB generation faster
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > +Reid and Alexandre, who have been doing work in this area
> > > > >> > > > +recently
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Sat, Feb 23, 2019 at 4:07 AM Leonardo Santagada via
> llvm-dev <llvm-dev at lists.llvm.org> wrote:
> > > > >> > > >
> > > > >> > > > Hi,
> > > > >> > > >
> > > > >> > > > Is anyone working on making the PDB generation on LLD
> faster?
> > > > >> > > > Looking of a trace for linking one of our binaries (it takes
> > > > >> > > > 1min6s-1min20s) I see two things:
> > > > >> > > >
> > > > >> > > > 1) LookupBucketFor(Val, ConstFoundBucket); takes 35s so
> > > > >> > > > almost half of the time of linking, mostly finding
> duplicates
> > > > >> > > > 2) There is no parallelization inside of addObjectsToPDB
> > > > >> > > >
> > > > >> > > > Is anyone working on those? Also has anyone thought about
> > > > >> > > > merging .obj files to deduplicate type infomation so we can
> > > > >> > > > do the linking on projects to generate something like a lib
> > > > >> > > > file, but deduplicated debug information (as far as I know
> > > > >> > > > actual .lib just put all pdbs or
> > > > >> > > > /Z7 debug info inside a file without dedup).
> > > > >> > > >
> > > > >> > > > Just looking at the code it seems it is much more mature and
> > > > >> > > > also the choice of SHA1_8 seems interesting (still don't
> know
> > > > >> > > > why not use xxHash64).
> > > > >> > > >
> > > > >> > > > ps: My code to add ghashes to msvc compiled .obj files is
> > > > >> > > > almost ready to be pushed as an option for llvm-objcopy.
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > >
> > > > >> > > > Leonardo Santagada
> > > > >> > > > _______________________________________________
> > > > >> > > > LLVM Developers mailing list
> > > > >> > > > llvm-dev at lists.llvm.org
> > > > >> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > >
> > > > >> > > Leonardo Santagada
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> >
> > > > >> > Leonardo Santagada
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >>
> > > > >> Leonardo Santagada
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Leonardo Santagada
> > >
> > >
> > >
> > > --
> > >
> > > Leonardo Santagada
> >
> >
> >
> > --
> >
> > Leonardo Santagada
>
>
>
> --
>
> Leonardo Santagada
>
> --
>
>
> Leonardo Santagada
>

-- 

Leonardo Santagada
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190228/e0cb1397/attachment-0001.html>