[PATCH] D53002: Create a new symbol file plugin for cross-platform PDB symbolization

Tue Oct 9 15:09:23 PDT 2018

zturner added a comment.

@lemo  Thanks for the detailed review!  I'll fix the suggestions and upload a new patch in a bit.

================
Comment at: lldb/lit/SymbolFile/NativePDB/simple-breakpoints.cpp:12
+// CHECK: (lldb) break set -n main
+// CHECK: Breakpoint 1: where = simple-breakpoints.cpp.tmp.exe`main + 23 at simple-breakpoints.cpp:15, address = 0x0000000140001017
+
----------------
lemo wrote:
> Is the offset and address going to be stable enough to rely on it for testing?
No, but it sounds like you're looking at an old version of the diff.  I removed this regex'ed all the address checks out.

================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/CompileUnitIndex.cpp:121
+CompileUnitIndex::GetOrCreateCompiland(PdbSymUid compiland_uid) {
+  auto result = m_comp_units.try_emplace(compiland_uid.toOpaqueId(), nullptr);
+  if (!result.second)
----------------
lemo wrote:
> is C++ 17 available? structured bindings can make this more readable:
> 
> const auto& [it, inserted] = ...;
> if (!inserted)
>   ...
> 
> 
Sadly no, LLVM is still on C++11.

================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp:34
+PdbIndex::create(std::unique_ptr<llvm::pdb::PDBFile> file) {
+  assert(file);
+  auto ExpectedDBI = file->getPDBDbiStream();
----------------
lemo wrote:
> lldbassert() ? 
> 
> (same comment for the rest of assert()s in the CR)
`lldbassert` is nice to validate things that you expect should happen but want to log an error if the condition is not held.  `assert` is better when you want to guarantee invariants of your API.  In this case I use `assert` because the precondition of this function is "`file` is non-null".  `lldbassert` wouldn't help here, because the very next line of code is to just dereference the pointer and we'd crash anyway.

================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp:37-38
+  if (!ExpectedDBI) {
+    llvm::consumeError(ExpectedDBI.takeError());
+    return nullptr;
+  }
----------------
lemo wrote:
> zturner wrote:
> > labath wrote:
> > > Maybe propagate this error to the caller and have the caller log the error when ignoring it?
> > These are a bit tricky.  In all cases, there is only possible thing the caller can do, which is to give up.  I don't think the caller particularly cares about the details of the error that occurred, the only way to handle it is to say "we're not going to use this plugin".  So I think it's fine to consume the error and just return a generic failure here.
> how do you feel about adding case-specific logging?
Logging is possible.  But I think there's such a thing as too verbose logging.  There's a balance between code simplicity and providing useful information in the log file.  This code is called at a stage when LLDB still hasn't even figured out that the PDB symbol file plugin is ultimately the right plugin for the job.  It's *asking* us if we can handle it.  So if we log an error and say "Invalid file, PDB doesn't have a DBI stream" it seems too verbose in my opinion.  Imagine if LLDB tried the DWARF plugin first (because it just goes down the list of looking for anyone who claims they can handle this file), and DWARF logged an error saying "Invalid DWARF debug information format", but it's not even DWARF in the first place.

I don't know what the best thing to do here is other than to say "we can't handle this", for which a single `nullptr` return value is sufficient.

================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp:122
+void PdbIndex::ParseSectionContribs() {
+  struct Visitor : public ISectionContribVisitor {
+    Visitor(PdbIndex &ctx, llvm::IntervalMap<uint64_t, uint16_t> &imap)
----------------
lemo wrote:
> class? (not just for the notation clarity, but there's no reason to make the visitor members public, right?
Well it's a local class defined inside of a member function, so in a sense it's already "private".  But you're right, it could be a class.

================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp:186
+  // In order to find this, we use the upper_bound of the key value which would
+  // be the first symbol whose starting address is higher than the element we're
+  // searching for.
----------------
lemo wrote:
> but is that what we really want? shouldn't we be looking for the "inner-most" range which contains the address?
> 
The purpose of this function is to find *all* symbols that occupy a given address.  Imagine a function which contains a nested block.  Just because you call this function with an address inside the block doesn't necessarily mean you want the block.  If we had to choose just 1 symbol to return, I'd rather choose the outermost symbol, because from there it's easier to determine all the inner symbols.  But going the other direction is more complicated.  This function however just returns a list of all of them and lets the caller decide what to do with that information.  We could, for example, build something on top of this to allow more granular selection, but that would still need to call into this and then filter the list down.

================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp:191
+
+  for (auto iter = cci.m_symbols_by_va.begin(); iter != ub; ++iter) {
+    const PdbCuSymId &cu_sym_id = iter->second.asCuSym();
----------------
lemo wrote:
> Ok, I see what we're doing. Wouldn't make more sense to iterate backward? (efficiency wise and more importantly to locate the most "inner" range, right?)
> 
See above, but most inner range is not necessarily what we're looking for.  In this patch, for example, the goal is to find the function symbol.  So if this returned the inner most range, it might return a block symbol and we'd still need to do more work to find the function symbol.  So this function just returns every symbol that contains the address, and lets the caller filter that down further (if desired).

================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/PdbSymUid.h:162
+    uint64_t result;
+    ::memcpy(&result, &m_uid, sizeof(m_uid));
+    return result;
----------------
lemo wrote:
> should't the last argument be sizeof(result) ? 
> 
> also, what is the meaning of this opaque id?
`sizeof(result) == sizeof(m_uid)`.  This can even be static_asserted, due to the construction of the `PdbSymUid` class.    See the comment at the top of this header file for a description.

The idea is that LLDB represents symbols, types, compilands, and everything else using a 64-bit integer.  So we partition a 64-bit integer into a 1-byte tag field that says what kind of symbol it is, and then 48 bytes of type-specific data.  When LLDB asks for a 64-bit unique identifier for a particular thing, we call this function, which is basically treating the 64-bit structure as a 64-bit integer.  The 1-byte tag field guarantees uniqueness across all symbol types.

================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/PdbUtil.cpp:124
+template <typename RecordT>
+static std::pair<uint16_t, uint32_t> GetSegmentAndOffset(const CVSymbol &sym) {
+  RecordT record = createRecord<RecordT>(sym);
----------------
lemo wrote:
> I'd prefer a dedicated SegmentAndOffset struct rather than the generic pair<>, what do you think?
Reasonable, I'll fix this.

https://reviews.llvm.org/D53002