[PATCH] D79467: [PDB] Optimize public symbol processing
Alexandre Ganea via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu May 7 13:02:14 PDT 2020
aganea added a comment.
Seems good to me, just a few things:
================
Comment at: lld/COFF/PDB.cpp:1346
+ std::vector<pdb::BulkPublic> publics;
symtab->forEachSymbol([&publics](Symbol *s) {
+ // Only emit external, defined, live symbols that have a chunk. Static,
----------------
Maybe reserve in advance to avoid reallocations?
```
unsigned count{};
symTab->forEachSymbol([](Symbol *s) {
auto *def = dyn_cast<Defined>(s);
count += (def && def->isLive() && def->getChunk());
});
publics.reserve(count);
```
Do you think there would be a gain to do the creation in parallel? (and do `.resize` instead in that case)
================
Comment at: llvm/include/llvm/DebugInfo/PDB/Native/GSIStreamBuilder.h:60
+ // Add public symbols in bulk.
+ void addPublicSymbols(std::vector<BulkPublic> Publics);
----------------
I know the call will be optimized anyway, but shouldn't we clarify the intention of transferring the ownership? `std::vector<BulkPublic> &&Publics` (just to prevent accidents in the future, if someone removes `std::move` at the caller site during a refactoring)
================
Comment at: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:72
+ void finalizeBuckets(uint32_t RecordZeroOffset,
+ std::vector<BulkPublic> Publics);
+
----------------
Same thing here: `std::vector<BulkPublic> &&`
================
Comment at: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:196
- // Compute the three tables: the hash records in bucket and chain order, the
- // bucket presence bitmap, and the bucket chain start offsets.
- HashRecords.reserve(Records.size());
+ finalizeBuckets(RecordZeroOffset, Globals);
+}
----------------
finalizeBuckets(RecordZeroOffset, **std::move(**Globals**)**)
================
Comment at: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:200
+void GSIHashStreamBuilder::finalizeBuckets(uint32_t RecordZeroOffset,
+ std::vector<BulkPublic> Globals) {
+ // Hash every name in parallel. Store the bucket index hash in the Segment
----------------
std::vector<BulkPublic> &&Globals
================
Comment at: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:203
+ // field of the BulkPublic struct now that we no longer need it.
+ parallelForEachN(0, Globals.size(), [&](size_t I) {
+ Globals[I].Segment = hashStringV1(Globals[I].Name) % IPHR_HASH;
----------------
Nice!
I am wondering if we shouldn't do `union { uint16_t Segment; uint16_t BucketIdx; };` to make things a bit more explicit. Up to you!
================
Comment at: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:215
+ auto BucketCmp = [](const BulkPublic &L, const BulkPublic &R) {
+ uint32_t LBucketIdx = L.Segment;
+ uint32_t RBucketIdx = R.Segment;
----------------
...and in that case, do we need these two assignements?
```
if (L.BucketIdx != R.BucketIdx)
return L.BucketIdx < R.BucketIdx;
```
================
Comment at: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:342
+ return L.Offset < R.Offset;
+ // parallel::sort is unstable, so we have to do name
+ // comparison to ensure that two names for the same location
----------------
// **parallelSort** is unstable, so we have to do name
================
Comment at: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:352
+ for (const BulkPublic &Pub : Publics)
+ PubAddrMap.push_back(ulittle32_t(Pub.SymOffset));
+
----------------
Do you think we can avoid `.push_back` altogether, and simplify to:
```
PubAddrMap.resize(Publics.size());
ulittle32_t *Next = &PubAddrMap.front();
for (const BulkPublic &Pub : Publics)
*Next++ = Pub.SymOffset;
```
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D79467/new/
https://reviews.llvm.org/D79467
More information about the llvm-commits
mailing list