[llvm-dev] Making LLD PDB generation faster

Sun Feb 24 15:43:02 PST 2019

More info inline, I think there is a couple of misconceptions on what I'm doing:

1) I already patch all my .obj files to contain .debug$H entries so it
is all ghashed already
2) All the 35s is spent adding to the DenseMap

Here is my current times (lld-link.exe compiled with -O2 so no
lto/pgo), lld generates a 141 MB binary and 1.2GB pdb file:

  Input File Reading:          1724 ms (  2.1%)
  Code Layout:                  482 ms (  0.6%)
  PDB Emission (Cumulative):  79261 ms ( 96.8%)
    Add Objects:              68650 ms ( 83.8%)
      Type Merging:           57534 ms ( 70.2%)
      Symbol Merging:         10822 ms ( 13.2%)
    TPI Stream Layout:         1501 ms (  1.8%)
    Globals Stream Layout:      770 ms (  0.9%)
    Commit to Disk:            7007 ms (  8.6%)
  Commit Output File:            19 ms (  0.0%)
-------------------------------------------------
Total Link Time:              81900 ms (100.0%)

Our target is for < 20 seconds linking, anything bellow 40 seconds
would be ok. Ideal times would be around 8s (in which it will mostly
beat link.exe incremental linking).

My tip for profiling is using superluminal
(https://www.superluminal.eu/) its the easiest way to see everything
your code is doing.

On Sun, Feb 24, 2019 at 5:18 PM Alexandre Ganea
<alexandre.ganea at ubisoft.com> wrote:
>
> Leonardo, to answer to your questions, yes to all of them  You can take a
>
> look at this prototype/proposal: https://reviews.llvm.org/D55585
>
>
>
> Overall, computing ghashes in parallel at link-time and merging Types with them
>
> is less costly that the current approach to merging. The 35sec you’re seeing
>
> for merging should go down to about 15sec.

I don't do much computing of ghashes as we already preprocess all .obj
files from msvc to add a .debug$H to them. The whole 35 seconds is
spent in just densehash findbucket function. The rest of the time is
mostly pagefaults (I guess to load in obj data and to grow the final
pdb?).

> The patch doesn’t parallelize
>
> (yet) the Type merging itself, but we have an alternate multithread-suitable
>
> implementation of DenseHash which already supports lockless, wait-free,
>
> insert/fetch/resize.

Where is this lockless densehash? This is the part were I would love
to help, but if there is a densehash it is probably just creating the
threads and letting them merge the results. I'm a bit afraid of
reproduceability of builds, but as we already don't have that with
link.exe we are not really loosing anything.

>
>
> The prototype allows for testing different hashing algorithms, and indeed
>
> xxHash seems to be the best general-purpose choice. I’ve also added support
>
> for more specialized hardware-based hashes, like Casey Muratori’s Meow Hash
>
> (uses hardware AES SSE 4.2 instructions), which brings the figures down a bit.
>

I remembered Meow hashes needing at least k bytes of data, but looking
at their website right now there is no mention of it. Hashing
performance isn't much of an impact as we do it per .obj file
distributed through our company so the time to calculate those are
completely distributed.

>
>
> Future changes could write back the computed ghash stream back to OBJs if
>
> /INCREMENTAL is specified (just an idea). Incrementally linking will be faster
>
> that way when working with MSVC OBJs.
>

I already have a patch for llvm-objcopy that adds a -add-ghashes
option that does this, I will be cleaning it up this week and sending
a PR for it

>
>
> As for creating PDBs for independent projects, that would help most likely.
>
> However the ghash stream would need to be stored in the PDB in that case
>
> (currently, ghashes are dropped after merging). That could help when using
>
> rarely compiled projects, used along with network caches.

I meant actually a .lib, with all the obj files inside plus a merged
.debug$H entry. No pdb generation or changes necessary, we just run
the same code that merges types in lld and do that a the librarian
level.

>
> I will start sending smaller patches to converge towards the functionally of
>
> the prototype above.
>
>
>
> Best,
>
> Alex.
>
>
>
> From: Zachary Turner <zturner at google.com>
> Sent: Sunday, February 24, 2019 1:20 AM
> To: Leonardo Santagada <santagada at gmail.com>
> Cc: Alexandre Ganea <alexandre.ganea at ubisoft.com>; Reid Kleckner <rnk at google.com>; llvm-dev <llvm-dev at lists.llvm.org>
> Subject: Re: [llvm-dev] Making LLD PDB generation faster
>
>
>
> +Reid and Alexandre, who have been doing work in this area recently
>
>
>
> On Sat, Feb 23, 2019 at 4:07 AM Leonardo Santagada via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> Hi,
>
> Is anyone working on making the PDB generation on LLD faster? Looking
> of a trace for linking one of our binaries (it takes 1min6s-1min20s) I
> see two things:
>
> 1) LookupBucketFor(Val, ConstFoundBucket); takes 35s so almost half of
> the time of linking, mostly finding duplicates
> 2) There is no parallelization inside of addObjectsToPDB
>
> Is anyone working on those? Also has anyone thought about merging .obj
> files to deduplicate type infomation so we can do the linking on
> projects to generate something like a lib file, but deduplicated debug
> information (as far as I know actual .lib just put all pdbs or /Z7
> debug info inside a file without dedup).
>
> Just looking at the code it seems it is much more mature and also the
> choice of SHA1_8 seems interesting (still don't know why not use
> xxHash64).
>
> ps: My code to add ghashes to msvc compiled .obj files is almost ready
> to be pushed as an option for llvm-objcopy.
>
> --
>
> Leonardo Santagada
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 

Leonardo Santagada