[PATCH] D33807: [CodeView] Allow Debug Subsections to be read/written in any order

Thu Jun 1 17:25:23 PDT 2017

zturner added inline comments.

================
Comment at: llvm/lib/DebugInfo/CodeView/DebugStringTableSubsection.cpp:79

 uint32_t DebugStringTableSubsection::getStringId(StringRef S) const {
+  assert(exists(S));
----------------
inglorion wrote:
> This will call StringToId.find(S) twice when asserts are enabled. Since that returns an iterator, can we instead do
> 
>   auto It = StringToId.find(S);
>   assert(It != StringToId.end() && "getStringId called with string that does not exist in the string table");
>   return It->second;
> 
> and save the second lookup?
For a debug build I think readability should take priority over performance unless there's a good reason not to.  Which is why I used `exists()`.  That said, I also think it's not a big deal and it's a minor preference, so I don't mind changing it.

================
Comment at: llvm/lib/DebugInfo/PDB/Native/ModuleDebugStream.cpp:93
+ModuleDebugStreamRef::findChecksumsSubsection() const {
+  for (const auto &SS : subsections()) {
+    if (SS.kind() != DebugSubsectionKind::FileChecksums)
----------------
inglorion wrote:
> If I'm reading the code in ModuleDebugStreamRef::reload() correctly, we read the subsections before it makes sense to get to this code. Given that, can we store which one is the FileChecksums subsection at reload() time so that we don't have to iterate over all the subsections to find it?
This is a `ModuleDebugStreamRef`, which literally contains no information about the subsections other than a big flat buffer consisting of their data.  In order to do what you suggest, we'd have to iterate them up front during the `reload`, and then you would suffer that performance hit even in situations where you never accessed the module debug streams.  I fixed a similar performance problem in pdbdump a few weeks ago where we were iterating all types up front to compute their hashes during the `reload` operatino, and then nobody ended up looking at the hash values.  Usually this isn't noticeable, but when I tried it on a 1.5GB PDB file, it took several minutes.

I could see doing something like caching the checksums subsection in the `ModuleDebugStreamRef` class and only doing the full scan the first time this function is called, but short of that I don't think we should change it.

https://reviews.llvm.org/D33807