[PATCH] D53002: Create a new symbol file plugin for cross-platform PDB symbolization

Zachary Turner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 9 10:04:56 PDT 2018


zturner added inline comments.


================
Comment at: lldb/lit/SymbolFile/NativePDB/simple-breakpoints.cpp:1
+// clang-format off
+
----------------
labath wrote:
> It seems llvm has a special .clang-format file in the test folder which sets `ColumnLimit: 0`. Maybe we should do the same?
We could do the same, but it won't remove the need to turn clang-format off.  Especially when we're running CHECk statements against line numbers in the file, we don't want clang-format modifying the file.  Even if we set column limit to 0 here, clang-format will write `main` on 1 line as `int main(int argc, char **argv) { return 0; }`.  That's an interesting test case, but it's a different test case than what we have here where we want the line number to point to the first line of the function that *isn't* the declaration.  It might even be interesting to have a third test case for

```
int main(int argc, char **argv)
{
  return 0;
}
```


================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp:37-38
+  if (!ExpectedDBI) {
+    llvm::consumeError(ExpectedDBI.takeError());
+    return nullptr;
+  }
----------------
labath wrote:
> Maybe propagate this error to the caller and have the caller log the error when ignoring it?
These are a bit tricky.  In all cases, there is only possible thing the caller can do, which is to give up.  I don't think the caller particularly cares about the details of the error that occurred, the only way to handle it is to say "we're not going to use this plugin".  So I think it's fine to consume the error and just return a generic failure here.


================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp:153
+    lldb::addr_t va = MakeVirtualAddress(segment, offset);
+    assert(cci.m_symbols_by_va.count(va) == 0);
+
----------------
stella.stamenova wrote:
> If the assert fails, does this mean that the symbol map won't be built correctly? Should it be an actual error instead of just an assert?
I think the only way this assert would fail is if two symbols in the PDB were indicated as having the same address.  The only time we call `BuildAddrToSymbolMap` is if `cci.m_symbols_by_va.empty()` is true.  Then, we only insert items into it at the bottom of this loop.  So over the course of the loop, the same address would have had to have been added already.  

If the debug info says two symbols have the same address, there isn't much we can do other than to change this to some kind of multimap, but that just pushes the decision of how to handle the incorrect debug info to the caller, which doesn't have any better strategy for resolving this kind of ambiguity.  That said, down below we call `insert` which assumes the item is not already in the map.  We could change that to a `try_emplace`, then assert that the insert succeeded.  This would have the effect of asserting in the same case that this asserts, but the program still "working" (it would just drop anything other than the first symbol with a given address).


================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/PdbSymUid.h:131
+    // S_PROCREF symbols are 1-based
+    assert(sym.Module > 0);
+    return makeCompilandId(sym.Module - 1);
----------------
stella.stamenova wrote:
> This assert looks like it really should be an error instead of just an assert.
This is a record which comes from the debug info, which technically is user input.  On the other hand, we need some layer at which we can assume debug info has been sanitized.  Propagating errors through multiple layers of infrastructure code just complicates the control flow and doesn't really add any value.  The idea is that you should have sanitized the record before you call this function.  

As it turns out, I only call this function in one place, and I didn't sanitize the input yet, partly because there are only two existing implementations of things that can write PDB files, and neither one of them will emit buggy CodeView records here.  But if you think it's better, I can add a sanitization check to the caller.


================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp:53
+
+#include <regex>
+
----------------
labath wrote:
> Is this actually used?
Nope, thanks!  Neither is some of the other stuff.  I started by copying all of `SymbolFilePDB.cpp` and then removing code.  But I missed this include.  I'll remove other dead code as well and re-upload.


================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp:341
+    const lldb_private::SymbolContext &sc) {
+  // What fields should I expect to be filled out on the SymbolContext?  Is it
+  // safe to assume that `sc.comp_unit` is valid?
----------------
stella.stamenova wrote:
> Did you mean to leave this comment here? Maybe with a FIXME?
My bad.  This was from the `SymbolFilePDB.cpp` which I started with as the template for implementing this file.  And I didn't remove this comment.


================
Comment at: lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp:78-85
+static bool ShouldUseNativeReader() {
+#if !defined(_WIN32)
+  return true;
+#endif
+  llvm::StringRef use_native = ::getenv("LLDB_USE_NATIVE_PDB_READER");
+  return use_native.equals_lower("on") || use_native.equals_lower("yes") ||
+         use_native.equals_lower("1") || use_native.equals_lower("true");
----------------
labath wrote:
> I am wondering whether it wouldn't be better to use an lldb setting instead of an env var (plugin.symbol-file.pdb.native ?). For this to work, we'd need to delay to decision of which plugin to use to until `CreateInstance` time, but that shouldn't be too hard.
I thought about this but decided against it.  There are two reasons I didn't want to do this:

1) On non-Windows platforms, the setting doesn't make any sense.  There is no other option.
2) It doesn't seem like something we want to "support" opting into or out of.  It's either in experimental / development mode, like right now, in which case the only reason you have to use it is if you're working on it.  Or it's complete, in which case we just delete the old one and make everything use the new one with no alternative.

So I couldn't come up with a valid use case to justify having it be a setting.


================
Comment at: lldb/source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp:307
       std::make_shared<Function>(sc.comp_unit, pdb_func.getSymIndexId(),
-                                 func_type_uid, mangled, func_type, func_range);
+                                 func_type_uid, mangled, nullptr, func_range);
 
----------------
aleksandr.urakov wrote:
> Why shouldn't we pass a type here? Isn't it unrelated to the new plugin?
Nice catch!  We actually should pass a type here.  The reason I'm not doing it is because that would require implementing the `PDBASTParser` class.  Obviously we need to do that eventually, but for now it wasn't necessary in order for LLDB to not complain.  So I was trying to keep the patch small and just implement the minimum amount needed for *something* to work.  But yes, we will need to do this.


https://reviews.llvm.org/D53002





More information about the llvm-commits mailing list