[PATCH] D87805: [PDB] Merge types in parallel when using ghashing

Reid Kleckner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 29 17:23:57 PDT 2020


rnk marked 2 inline comments as done.
rnk added a comment.

The next patch has some changes to make the output PDB **identical** to the old ghash PDB. I had to sort the dependency type sources to the front so that dependency types appear earlier in the final type stream. I did a few links and MD5'd the before and after PDBs with `/Brepro`, and they are the same. I'm reasonably confident in this at this point.

Hopefully I addressed all the outstanding comments. I think this is probably ready to land, and then we should probably make ghashing the default, since it's basically faster than the standard type merging implementation at this point. I don't have exhaustive testing for /Zi objects, though, so you might want to do some validation there.



================
Comment at: lld/COFF/DebugTypes.cpp:745
+  // server.
+  TypeServerSource *tsSrc = check(getTypeServerSource());
+  tpiMap = tsSrc->tpiMap;
----------------
aganea wrote:
> This needs to be:
> ```
>   Expected<TypeServerSource *> tsSrc = getTypeServerSource();
>   if (!tsSrc)
>     return; // ignore errors at this point.
> ```
> Since a missing PDB is not en error, we just won't have types & symbols for that .OBJ - and we're already handling that later in `mergeDebugT`. Could you also please modify `pdb-type-server-missing.yaml` as it lacks `/DEBUG:GHASH` coverage, which should catch this case?
Done, and added the coverage.


================
Comment at: lld/COFF/DebugTypes.cpp:813
+/// 1. concurrent insertion
+/// 2. concurrent lookup
+/// It does not support deletion or rehashing. It uses linear probing.
----------------
rnk wrote:
> Hah, this comment is stale. The table actually doesn't support lookup at all anymore. It used to, before I figured out the trick of saving the insert position from the parallel insert step.
I updated the comments here.


================
Comment at: lld/COFF/DebugTypes.cpp:932
+    // Advance the probe. Wrap around to the beginning if we run off the end.
+    ++idx;
+    idx = idx == tableSize ? 0 : idx;
----------------
rnk wrote:
> aganea wrote:
> > It'd be interesting to collect statistics on how many collisions you get. And also compare linear (current behavior) vs. quadratic probing.
> > 
> > One issue I can see is that since the table will be 99.9% full at the end of the insertion pass, there will lots of collisions toward the end. What about making the table 25% bigger, like DenseHash does?
> I don't have collision stats, but I can say that the load factor in the tests I was using goes from 70% (small PDBs) to 14% (big programs, lots of duplicate types to eliminate). So, the more inputs you run it on, the more memory gets allocated, the fewer collisions their are, and the shorter the chains are.
I didn't end up collecting more stats, but the load factor is in the /verbose output if you want to check.


================
Comment at: lld/COFF/DebugTypes.cpp:953
+  // Load ghashes. Do type servers and PCH objects first.
+  parallelForEach(TpiSource::instances, [&](TpiSource *source) {
+    if (source->isDependency())
----------------
aganea wrote:
> Can you please add a timer for this part? (just the ghash generation for all files)
Sure, the new output looks like:
```
  Input File Reading:          7367 ms ( 25.9%)
  Code Layout:                 1434 ms (  5.0%)
  Commit Output File:            44 ms (  0.2%)
  PDB Emission (Cumulative):  17956 ms ( 63.2%)
    Global Type Hashing:        651 ms (  2.3%)
    GHash Type Merging:        2533 ms (  8.9%)
    Add Objects:              10098 ms ( 35.5%)
      Symbol Merging:          6882 ms ( 24.2%)
    Publics Stream Layout:     1027 ms (  3.6%)
    TPI Stream Layout:          111 ms (  0.4%)
    Commit to Disk:            5410 ms ( 19.0%)
-------------------------------------------------
Total Link Time:              28427 ms (100.0%)
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87805/new/

https://reviews.llvm.org/D87805



More information about the llvm-commits mailing list