[PATCH] D54802: [LLD][COFF] Generate import modules in PDB

Wed Nov 28 13:21:35 PST 2018

rnk added a comment.

In D54802#1311401 <https://reviews.llvm.org/D54802#1311401>, @aganea wrote:

>     Input File Reading:          1658 ms (  4.7%)
>     Code Layout:                  621 ms (  1.8%)
>     PDB Emission (Cumulative):  30380 ms ( 86.7%)
>       Add Objects:              22615 ms ( 64.6%)
>         Type Merging:           19205 ms ( 54.8%)
>         Symbol Merging:          3385 ms (  9.7%)
>       TPI Stream Layout:          897 ms (  2.6%)
>       Globals Stream Layout:     1418 ms (  4.1%)
>       Commit to Disk:            4559 ms ( 13.0%)
>     Commit Output File:          1717 ms (  4.9%)
>   -------------------------------------------------
>   Total Link Time:              35021 ms (100.0%)
>

I think we can optimize copying symbols using the same techniques we've used for optimizing LLD. Symbol records are pretty straightforward: You figure out which .debug$S sections are live, relocate them, process them a bit, and copy the bytes to the PDB in some order. In D54554 <https://reviews.llvm.org/D54554> I did some work, and that cuts down the "Commit Output File" (13%) and "Symbol Merging" times, but there may be more things to do.

For types, I think this is one of the classic problems where you make a hash table and say "it's O(1) insertion" but really it's order "length of key", and probably with a high constant factor. I think if we looked at the distribution of type record sizes, we'd see a few categories like this:

1. Small records, < 20bytes, i.e. less than a SHA-1, like pointer types, qualified (const/volatile) types, array types, function types, etc. Merging theses in the linker should be cheap, ghash or no ghash.
2. Records with names, like LF_STRUCTURE and LF_PROC_ID. C++ mangled names can get long, but they probably aren't 64K, the max CV record length, bytes long. Think ~4KB.
3. Long field lists. LF_FIELDLIST is a bear. I expect on average most field lists are 32KB plus, since they include the name of every field, method, and typedef. Templates tend to have lots of members, many of which are unused in most instantiations.

The other thing to keep in mind is that type hash table "hits" are common. Most types are duplicates. Given some 64KB field list record, you can expect it will be deduplicated O(#objects) times. Without the content hash, this means you have to `memcmp` the entire 64KB for every duplicate. With the hash, you do O(length of hash) work, so a 8 or 20 byte memcmp. I think that's where the real ghash gains are coming from. I'm not sure what to do with that information, but it seems like a useful insight.

One thing we talked about a while ago was trying to parallelize type merging. I think the challenge there is it requires a good concurrent hash map implementation, which we'd have to find or implement ourselves.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D54802/new/

https://reviews.llvm.org/D54802