[PATCH] D54554: [PDB] Add symbol records in bulk

Zachary Turner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Nov 16 11:12:16 PST 2018


zturner added inline comments.


================
Comment at: lld/COFF/PDB.cpp:813
     // FileStaticSym::ModFileOffset
-    recordStringTableReferenceAtOffset(Contents, 4, StrTableRefs);
+    recordStringTableReferenceAtOffset(Contents, 8, StrTableRefs);
     break;
----------------
Why did this change from 4 to 8?


================
Comment at: lld/COFF/PDB.cpp:1007-1008
 
-static void addGlobalSymbol(pdb::GSIStreamBuilder &Builder, ObjFile &File,
-                            const CVSymbol &Sym) {
+static void addGlobalSymbol(pdb::GSIStreamBuilder &Builder, unsigned ModIndex,
+                            unsigned SymOffset, const CVSymbol &Sym) {
   switch (Sym.kind()) {
----------------
`ModIndex` should maybe be a `uint16_t` since that's what it is in the underlying file.  We just cast it below anyway, might as well remove any chance of truncation by having the function take a `uint16_t` directly.


================
Comment at: lld/COFF/PDB.cpp:1070
+  if (NeedsRealignment) {
+    void *AlignedData = Alloc.Allocate(RealignedSize, 4);
+    AlignedSymbolMem = makeMutableArrayRef(
----------------
Is the 4 here supposed to be the alignment?  How about `alignOf(CodeViewContainer::Pdb)`?


================
Comment at: lld/COFF/PDB.cpp:1078
+  ArrayRef<uint8_t> BulkSymbols;
+  cantFail(forEachCodeViewRecord<CVSymbol>(
+      SymsBuffer, [&](CVSymbol Sym) -> llvm::Error {
----------------
I feel like this is do-able with just less than 2 passes.

What if we kept a vector of offsets into the `SymData` buffer, and the offsets would represent the position of records which required an alignment adjustment.  If you get through the loop and the vector is empty, then that is the same as `NeedsAlignment == false`.

As you iterate the records, you remap the type indices in the original buffer, which may still require alignment, but you don't actually do any re-alignment yet.  For each record that does require realignment, you push its offset onto the vector.  Note that you can collapse adjacent alignments, so for example if you have Record 1 which is at offset 3 mod 4, and Record 2 is also offset 3 mod 4, then only 1 re-alignment is required.  Because as soon as you realign the output buffer to 0 mod 4, then both records will be re-aligned.

So you're basically pushing offsets onto the vector when offset % 4 changes.

At the end of this process, all remapping has been done, but no re-alignment has occurred.

If the offset vector is empty, you're done.  Just return, and you only made 1 pass over the records.

If the offset vector is not empty, you walk it, and for each offset you add Offset[n] - Offset[n-1] bytes of padding, then copy in bulk up to the next offset.

Now, in the best case scenario, you only have 1 pass over the inputs, but in the worst case scenario, you don't have O(2n), you have O(n+m) where m < n.


https://reviews.llvm.org/D54554





More information about the llvm-commits mailing list