[llvm-dev] Making LLD PDB generation faster
Zachary Turner via llvm-dev
llvm-dev at lists.llvm.org
Mon Feb 25 07:36:15 PST 2019
Is -Tllvm even supported? I thought the only thing you could pass for -T
was -Thost=x64
On Mon, Feb 25, 2019 at 6:52 AM Leonardo Santagada <santagada at gmail.com>
wrote:
> I think its a huge bug that it doesn't raise any errors or warnings
> about it. But I will open a ticket on cmake, they should be using
> clang-cl.exe and lld-link.exe if T="llvm" probably set host to 64 bit
> as well.
>
> On Mon, Feb 25, 2019 at 3:34 PM Zachary Turner <zturner at google.com> wrote:
> >
> > I don’t think changing the compiler or linker is supported with the vs
> generator, but I also don’t think it’s a bug
> > On Mon, Feb 25, 2019 at 6:31 AM Alexandre Ganea <
> alexandre.ganea at ubisoft.com> wrote:
> >>
> >> Can you please try using Ninja instead?
> >>
> >> cmake -G Ninja f:/svn/llvm -DCMAKE_BUILD_TYPE=Release
> -DLLVM_OPTIMIZED_TABLEGEN=true -DLLVM_EXTERNAL_LLD_SOURCE_DIR=f:/svn/lld
> -DLLVM_TOOL_LLD_BUILD=true -DLLVM_ENABLE_LLD=true
> -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe"
> -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe"
> -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe"
> -DLLVM_ENABLE_PDB=true
> >>
> >> It will be faster to compile. The setup I use is the above Ninja
> cmd-line for compiling optimized builds; and in addition, I keep the Visual
> Studio generator, as you do, but only for having a .sln to debug. It is a
> bit annoying to cmake twice, in two different build folders, but you can
> write a batch script.
> >>
> >> If the above works, maybe you should log the bug on
> https://bugs.llvm.org/ so it is not forgotten.
> >>
> >> Alex.
> >>
> >> -----Original Message-----
> >> From: Leonardo Santagada <santagada at gmail.com>
> >> Sent: Monday, February 25, 2019 9:04 AM
> >> To: Alexandre Ganea <alexandre.ganea at ubisoft.com>
> >> Cc: Zachary Turner <zturner at google.com>; Reid Kleckner <rnk at google.com>;
> llvm-dev <llvm-dev at lists.llvm.org>
> >> Subject: Re: [llvm-dev] Making LLD PDB generation faster
> >>
> >> Ok so there's a lot of confusion on cmake regarding using llvm as a
> toolset. It still does all its checks against cl.exe (not clang-cl) and
> somehow overriders CMAKE_LINKER to be link.exe. I tried a couple of places
> including:
> >>
> >> cmake -G "Visual Studio 15 2017" -A x64 -T"llvm",host=x64
> -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe"
> >> -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe"
> >> -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe"
> >> -DLLVM_ENABLE_LTO=true -DLLVM_ENABLE_PDB=true
> -DLLVM_ENABLE_PROJECTS=lld ../llvm
> >>
> >> but it seems like the generator overrides it.
> >>
> >>
> >> ps: Created a phabricator account
> >>
> >> On Mon, Feb 25, 2019 at 2:48 PM Alexandre Ganea <
> alexandre.ganea at ubisoft.com> wrote:
> >> >
> >> > That's good news. For having debug info, you could try adding /Z7 on
> the cmake cmd-line, such as -DCMAKE_CXX_FLAGS="/Z7". Or use the
> 'RelWithDebInfo' target instead of 'Release' and add
> -DCMAKE_CXX_FLAGS="/Ob2" (because that target uses /Ob1 as a default).
> >> >
> >> > Can you please send a patch on Phabricator if you fix the
> LLVM_ENABLE_PDB issue with Clang? The goal is to have performance
> out-of-the-box.
> >> >
> >> > Alex.
> >> >
> >> > -----Original Message-----
> >> > From: Leonardo Santagada <santagada at gmail.com>
> >> > Sent: Monday, February 25, 2019 7:36 AM
> >> > To: Alexandre Ganea <alexandre.ganea at ubisoft.com>
> >> > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner
> >> > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> >> > Subject: Re: [llvm-dev] Making LLD PDB generation faster
> >> >
> >> > With your patch for cmake and reconfiguring it with "cmake -G "Visual
> Studio 15 2017" -A x64 -T"llvm",host=x64 -DLLVM_ENABLE_PDB=true
> -DLLVM_ENABLE_PROJECTS=lld ../llvm" we get these results:
> >> >
> >> > Input File Reading: 1602 ms ( 3.5%)
> >> > Code Layout: 493 ms ( 1.1%)
> >> > PDB Emission (Cumulative): 43127 ms ( 94.5%)
> >> > Add Objects: 34577 ms ( 75.8%)
> >> > Type Merging: 26709 ms ( 58.5%)
> >> > Symbol Merging: 7598 ms ( 16.7%)
> >> > TPI Stream Layout: 1107 ms ( 2.4%)
> >> > Globals Stream Layout: 602 ms ( 1.3%)
> >> > Commit to Disk: 5636 ms ( 12.4%)
> >> > Commit Output File: 16 ms ( 0.0%)
> >> > -------------------------------------------------
> >> > Total Link Time: 45626 ms (100.0%)
> >> >
> >> > Unfortunately there were no pdb generated with lld.exe (or any other
> >> > binaries) so I can't debug them. It seems like LLVM_ENABLE_PDB is not
> made to support using clang to complie itself as it tries to att /Zi to the
> targets instead of /Z7 and global hashes. I can patch it over here, but we
> probably want to fix this in cmake and on the docs, as its not clear at all
> how to compile lld in a performance 64bit way.
> >> >
> >> > On Mon, Feb 25, 2019 at 2:38 AM Alexandre Ganea <
> alexandre.ganea at ubisoft.com> wrote:
> >> > >
> >> > > How do you compile LLD? There's a big difference between when using
> >> > > MSVC vs Clang. The parallel ghash patch I was mentioning is almost
> >> > > 2x as fast when using Clang 7.0+ vs. MSVC 15.9+, I don't know
> >> > > exactly why. I also suggest you use the Release target. You should
> also grab this patch:
> >> > > https://reviews.llvm.org/D55056 - I had to revert it because it was
> >> > > causing issues with LLDB. But it will give an improvement for LLD.
> >> > > Please let me know if that improves your timings.
> >> > >
> >> > > The page faults are probably the OS loading from disk: most, if not
> >> > > all the files are accessed by LLD by mmap'ing them.
> >> > >
> >> > > The lockless DenseHash I was talking about will be published in an
> >> > > upcoming patch. As for reproducibility, this can be an issue on
> >> > > build systems. But on local machines, we could explicitly state that
> >> > > we want non-deterministic builds, through some cmd-line flag. If
> your 57sec for "Type Merging"
> >> > > transforms into 5sec when non-deterministic, I think that's worth
> it.
> >> > >
> >> > > Alex.
> >> > >
> >> > > -----Original Message-----
> >> > > From: Leonardo Santagada <santagada at gmail.com>
> >> > > Sent: Sunday, February 24, 2019 6:43 PM
> >> > > To: Alexandre Ganea <alexandre.ganea at ubisoft.com>
> >> > > Cc: Zachary Turner <zturner at google.com>; Reid Kleckner
> >> > > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> >> > > Subject: Re: [llvm-dev] Making LLD PDB generation faster
> >> > >
> >> > > More info inline, I think there is a couple of misconceptions on
> what I'm doing:
> >> > >
> >> > > 1) I already patch all my .obj files to contain .debug$H entries so
> >> > > it is all ghashed already
> >> > > 2) All the 35s is spent adding to the DenseMap
> >> > >
> >> > > Here is my current times (lld-link.exe compiled with -O2 so no
> lto/pgo), lld generates a 141 MB binary and 1.2GB pdb file:
> >> > >
> >> > > Input File Reading: 1724 ms ( 2.1%)
> >> > > Code Layout: 482 ms ( 0.6%)
> >> > > PDB Emission (Cumulative): 79261 ms ( 96.8%)
> >> > > Add Objects: 68650 ms ( 83.8%)
> >> > > Type Merging: 57534 ms ( 70.2%)
> >> > > Symbol Merging: 10822 ms ( 13.2%)
> >> > > TPI Stream Layout: 1501 ms ( 1.8%)
> >> > > Globals Stream Layout: 770 ms ( 0.9%)
> >> > > Commit to Disk: 7007 ms ( 8.6%)
> >> > > Commit Output File: 19 ms ( 0.0%)
> >> > > -------------------------------------------------
> >> > > Total Link Time: 81900 ms (100.0%)
> >> > >
> >> > > Our target is for < 20 seconds linking, anything bellow 40 seconds
> would be ok. Ideal times would be around 8s (in which it will mostly beat
> link.exe incremental linking).
> >> > >
> >> > > My tip for profiling is using superluminal
> >> > > (https://www.superluminal.eu/) its the easiest way to see
> everything your code is doing.
> >> > >
> >> > > On Sun, Feb 24, 2019 at 5:18 PM Alexandre Ganea <
> alexandre.ganea at ubisoft.com> wrote:
> >> > > >
> >> > > > Leonardo, to answer to your questions, yes to all of them You
> >> > > > can take a
> >> > > >
> >> > > > look at this prototype/proposal: https://reviews.llvm.org/D55585
> >> > > >
> >> > > >
> >> > > >
> >> > > > Overall, computing ghashes in parallel at link-time and merging
> >> > > > Types with them
> >> > > >
> >> > > > is less costly that the current approach to merging. The 35sec
> >> > > > you’re seeing
> >> > > >
> >> > > > for merging should go down to about 15sec.
> >> > >
> >> > > I don't do much computing of ghashes as we already preprocess all
> .obj files from msvc to add a .debug$H to them. The whole 35 seconds is
> spent in just densehash findbucket function. The rest of the time is mostly
> pagefaults (I guess to load in obj data and to grow the final pdb?).
> >> > >
> >> > > > The patch doesn’t parallelize
> >> > > >
> >> > > > (yet) the Type merging itself, but we have an alternate
> >> > > > multithread-suitable
> >> > > >
> >> > > > implementation of DenseHash which already supports lockless,
> >> > > > wait-free,
> >> > > >
> >> > > > insert/fetch/resize.
> >> > >
> >> > > Where is this lockless densehash? This is the part were I would
> love to help, but if there is a densehash it is probably just creating the
> threads and letting them merge the results. I'm a bit afraid of
> reproduceability of builds, but as we already don't have that with link.exe
> we are not really loosing anything.
> >> > >
> >> > > >
> >> > > >
> >> > > > The prototype allows for testing different hashing algorithms, and
> >> > > > indeed
> >> > > >
> >> > > > xxHash seems to be the best general-purpose choice. I’ve also
> >> > > > added support
> >> > > >
> >> > > > for more specialized hardware-based hashes, like Casey Muratori’s
> >> > > > Meow Hash
> >> > > >
> >> > > > (uses hardware AES SSE 4.2 instructions), which brings the
> figures down a bit.
> >> > > >
> >> > >
> >> > > I remembered Meow hashes needing at least k bytes of data, but
> looking at their website right now there is no mention of it. Hashing
> performance isn't much of an impact as we do it per .obj file distributed
> through our company so the time to calculate those are completely
> distributed.
> >> > >
> >> > > >
> >> > > >
> >> > > > Future changes could write back the computed ghash stream back to
> >> > > > OBJs if
> >> > > >
> >> > > > /INCREMENTAL is specified (just an idea). Incrementally linking
> >> > > > will be faster
> >> > > >
> >> > > > that way when working with MSVC OBJs.
> >> > > >
> >> > >
> >> > > I already have a patch for llvm-objcopy that adds a -add-ghashes
> >> > > option that does this, I will be cleaning it up this week and
> >> > > sending a PR for it
> >> > >
> >> > > >
> >> > > >
> >> > > > As for creating PDBs for independent projects, that would help
> most likely.
> >> > > >
> >> > > > However the ghash stream would need to be stored in the PDB in
> >> > > > that case
> >> > > >
> >> > > > (currently, ghashes are dropped after merging). That could help
> >> > > > when using
> >> > > >
> >> > > > rarely compiled projects, used along with network caches.
> >> > >
> >> > > I meant actually a .lib, with all the obj files inside plus a
> merged .debug$H entry. No pdb generation or changes necessary, we just run
> the same code that merges types in lld and do that a the librarian level.
> >> > >
> >> > > >
> >> > > > I will start sending smaller patches to converge towards the
> >> > > > functionally of
> >> > > >
> >> > > > the prototype above.
> >> > > >
> >> > > >
> >> > > >
> >> > > > Best,
> >> > > >
> >> > > > Alex.
> >> > > >
> >> > > >
> >> > > >
> >> > > > From: Zachary Turner <zturner at google.com>
> >> > > > Sent: Sunday, February 24, 2019 1:20 AM
> >> > > > To: Leonardo Santagada <santagada at gmail.com>
> >> > > > Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com>; Reid Kleckner
> >> > > > <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> >> > > > Subject: Re: [llvm-dev] Making LLD PDB generation faster
> >> > > >
> >> > > >
> >> > > >
> >> > > > +Reid and Alexandre, who have been doing work in this area
> >> > > > +recently
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Sat, Feb 23, 2019 at 4:07 AM Leonardo Santagada via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > > Is anyone working on making the PDB generation on LLD faster?
> >> > > > Looking of a trace for linking one of our binaries (it takes
> >> > > > 1min6s-1min20s) I see two things:
> >> > > >
> >> > > > 1) LookupBucketFor(Val, ConstFoundBucket); takes 35s so almost
> >> > > > half of the time of linking, mostly finding duplicates
> >> > > > 2) There is no parallelization inside of addObjectsToPDB
> >> > > >
> >> > > > Is anyone working on those? Also has anyone thought about merging
> >> > > > .obj files to deduplicate type infomation so we can do the linking
> >> > > > on projects to generate something like a lib file, but
> >> > > > deduplicated debug information (as far as I know actual .lib just
> >> > > > put all pdbs or
> >> > > > /Z7 debug info inside a file without dedup).
> >> > > >
> >> > > > Just looking at the code it seems it is much more mature and also
> >> > > > the choice of SHA1_8 seems interesting (still don't know why not
> >> > > > use xxHash64).
> >> > > >
> >> > > > ps: My code to add ghashes to msvc compiled .obj files is almost
> >> > > > ready to be pushed as an option for llvm-objcopy.
> >> > > >
> >> > > > --
> >> > > >
> >> > > > Leonardo Santagada
> >> > > > _______________________________________________
> >> > > > LLVM Developers mailing list
> >> > > > llvm-dev at lists.llvm.org
> >> > > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > >
> >> > > Leonardo Santagada
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Leonardo Santagada
> >>
> >>
> >>
> >> --
> >>
> >> Leonardo Santagada
>
>
>
> --
>
> Leonardo Santagada
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190225/e657a683/attachment.html>
More information about the llvm-dev
mailing list