[PATCH] D81296: [PDB] Defer public serialization until PDB writing

Reid "Away June-Sep" Kleckner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 30 11:24:44 PDT 2020


rnk marked 8 inline comments as done.
rnk added inline comments.


================
Comment at: llvm/include/llvm/DebugInfo/PDB/Native/GSIStreamBuilder.h:122
+  // PublicSymFlags.
+  uint16_t Flags : 3;
 
----------------
aganea wrote:
> `PublicSymFlags` needs 4 bits, you could take one from `BucketIdx` below since you don't need more than 12 bits for that - you're doing `% IPHR_HASH` below.
Ah, true. I looked at the enum and thought it was a four value enum, not a four *flag* enum.


================
Comment at: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:43
   std::vector<PSHashRecord> HashRecords;
   std::array<support::ulittle32_t, (IPHR_HASH + 32) / 32> HashBitmap;
   std::vector<support::ulittle32_t> HashBuckets;
----------------
hans wrote:
> I see this was just moved, but if the goal is to compute ceil(IPHR_HASH / 32) then I think +31 instead of +32 would be enough.
For some reason, the PDB on disk has `IPHR_HASH + 1` buckets, and this needs to be reflected in the bitmap, which needs `ceil(IPHR_HASH+1/32)` words. The last bucket is always empty, since the hash is always calculated as `crc32 % IPHR_HASH` (not `%IPHR_HASH+1`).

I don't see any comments about this, but I thought they were in here somewhere, since I remember dealing with this edge case in the past. I'll add some.


================
Comment at: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:198
+  // bucket start offsets.
+  uint32_t BucketStarts[IPHR_HASH];
+  memset(&BucketStarts[0], 0, sizeof(BucketStarts));
----------------
aganea wrote:
> You could also do a one-liner `uint32_t BucketStarts[IPHR_HASH] = {0};`
Sure. I think I tend to shy away from this form, even though I like it, because different compilers tend to have conflicting warnings like `-Wmissing-field-initializers` `-Wmissing-braces` and others. I think they don't apply in this case, though.


================
Comment at: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:203
+  uint32_t Sum = 0;
+  for (uint32_t &B : BucketStarts) {
+    uint32_t Size = B;
----------------
aganea wrote:
> This also could be:
> ```
> for (unsigned I = 1; I < IPHR_HASH; ++I)
>   BucketStarts[I] += BucketStarts[I-1];
> BucketStarts[0] = 0;
> ```
I don't think that's quite the same. Suppose we have three buckets, all size 1. Your loop will produce: `0, 2, 3`, but I think the desired start offsets are `0, 1, 2`.

I think the fancy way to do this is `std::exclusive_scan`, but it's new in C++17:
https://en.cppreference.com/w/cpp/algorithm/exclusive_scan


================
Comment at: llvm/lib/DebugInfo/PDB/Native/GSIStreamBuilder.cpp:213
+  HashRecords.resize(Records.size());
+  uint32_t BucketCursors[IPHR_HASH];
+  memcpy(BucketCursors, BucketStarts, sizeof(BucketCursors));
----------------
aganea wrote:
> Same as above: `uint32_t BucketCursors[IPHR_HASH] = {0};`
The initialization stores would be made dead by the memcpy on the next line, though.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81296/new/

https://reviews.llvm.org/D81296





More information about the llvm-commits mailing list