[PATCH] D79467: [PDB] Optimize public symbol processing

Reid Kleckner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri May 8 07:28:51 PDT 2020


rnk added inline comments.


================
Comment at: lld/COFF/PDB.cpp:1346
+  std::vector<pdb::BulkPublic> publics;
   symtab->forEachSymbol([&publics](Symbol *s) {
+    // Only emit external, defined, live symbols that have a chunk. Static,
----------------
aganea wrote:
> Maybe reserve in advance to avoid reallocations?
> ```
> unsigned count{};
> symTab->forEachSymbol([](Symbol *s) {
>   auto *def = dyn_cast<Defined>(s);
>   count += (def && def->isLive() && def->getChunk());
> });
> publics.reserve(count);
> ```
> Do you think there would be a gain to do the creation in parallel? (and do `.resize` instead in that case)
This loop was relatively short compared to what comes next. I think that is because it doesn't touch the name strings. I think we'd lose more from checking all symbols twice than we spend copying and growing the vector.

There's more to be gained from parallelizing section contribution creation or the main object file processing, so I plan to look at that instead.


================
Comment at: llvm/include/llvm/DebugInfo/PDB/Native/GSIStreamBuilder.h:60
+  // Add public symbols in bulk.
+  void addPublicSymbols(std::vector<BulkPublic> Publics);
 
----------------
aganea wrote:
> I know the call will be optimized anyway, but shouldn't we clarify the intention of transferring the ownership? `std::vector<BulkPublic> &&Publics` (just to prevent accidents in the future, if someone removes `std::move` at the caller site during a refactoring)
Sure, that's a good idea.


================
Comment at: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:203
+  // field of the BulkPublic struct now that we no longer need it.
+  parallelForEachN(0, Globals.size(), [&](size_t I) {
+    Globals[I].Segment = hashStringV1(Globals[I].Name) % IPHR_HASH;
----------------
aganea wrote:
> Nice!
> 
> I am wondering if we shouldn't do `union { uint16_t Segment; uint16_t BucketIdx; };` to make things a bit more explicit. Up to you!
I'll do it. I went the other way while working on the code because the struct is declared in a header, so it affects the PDB.cpp code. We also can't use a nameless union field because I believe that is an extension. The zero initialization also becomes complicated.


================
Comment at: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:352
+  for (const BulkPublic &Pub : Publics)
+    PubAddrMap.push_back(ulittle32_t(Pub.SymOffset));
+
----------------
aganea wrote:
> Do you think we can avoid `.push_back` altogether, and simplify to:
> ```
>   PubAddrMap.resize(Publics.size());
>   ulittle32_t *Next = &PubAddrMap.front();
>   for (const BulkPublic &Pub : Publics)
>     *Next++ = Pub.SymOffset;
> ```
I think this is better because it avoids the zero-initialization of resize. This loop doesn't show up in the profile, FWIW.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79467/new/

https://reviews.llvm.org/D79467





More information about the llvm-commits mailing list