[Lldb-commits] [PATCH] D89812: [lldb][PDB] Add ObjectFile PDB plugin

Fri Oct 23 12:55:02 PDT 2020

zequanwu added inline comments.

================
Comment at: lldb/source/Plugins/ObjectFile/PDB/ObjectFilePDB.cpp:176-197
+  switch (dbi_stream->getMachineType()) {
+  case PDB_Machine::Amd64:
+    spec.SetTriple("x86_64-pc-windows");
+    specs.Append(module_spec);
+    break;
+  case PDB_Machine::x86:
+    spec.SetTriple("i386-pc-windows");
----------------
labath wrote:
> zequanwu wrote:
> > labath wrote:
> > > zequanwu wrote:
> > > > labath wrote:
> > > > > Could this use the same algorithm as `ObjectFilePDB::GetArchitecture` ?
> > > > No. This part is the same algorithm as the part in `ObjectFilePECOFF::GetModuleSpecifications`, https://github.com/llvm/llvm-project/blob/master/lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp#L194-L215.
> > > > 
> > > > The `ObjectFilePDB::GetArchitecture` is same as `ObjectFilePECOFF::GetArchitecture`.
> > > I'm not sure this kind of consistency actually helps. For example, in `Module::MatchesModuleSpec`, module's own architecture (as provided by `ObjectFile(PECOFF)::GetArchitecture`) is compared to the one in the ModuleSpec (generally coming from `ObjectFile(PDB)::GetModuleSpecifications`.
> > > 
> > > If there's any mismatch between the two ways of computing a ArchSpec, then there will be trouble, even though they are "symmetric".
> > > 
> > > That said, I think we can keep this like it is in this patch, but I'd like to clean up both implementations in a another patch.
> > So, we want to refactor out the common part of `ObjectFilePECOFF::GetArchitecture` and `ObjectFilePDB::GetModuleSpecifications` to keep them consistent? I will do it in another patch.
> > 
> I'd actually say all four functions (ObjectFile{PECOFF,PDB}::{GetArchitecture,GetModuleSpecifications}) should use the same algorithm, but lets leave that for the other patch.
It's interesting that `GetArchitecture` and `GetModuleSpecifications` in ObjectFilePECOFF currently do not have the same algorithm, but works fine.

================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/PdbIndex.cpp:42
 llvm::Expected<std::unique_ptr<PdbIndex>>
-PdbIndex::create(std::unique_ptr<llvm::pdb::PDBFile> file) {
+PdbIndex::create(llvm::pdb::PDBFile *&file) {
   lldbassert(file);
----------------
labath wrote:
> What's up with the reference?
Removed reference.

================
Comment at: lldb/source/Plugins/SymbolFile/NativePDB/SymbolFileNativePDB.cpp:299
     std::unique_ptr<PDBFile> file_up =
         loadMatchingPDBFile(m_objfile_sp->GetFileSpec().GetPath(), m_allocator);

----------------
labath wrote:
> zequanwu wrote:
> > labath wrote:
> > > zequanwu wrote:
> > > > labath wrote:
> > > > > Instead of changing `loadMatchingPDBFile` I think we ought to change this to something like:
> > > > > ```
> > > > > if (auto *pdb = llvm::dyn_cast<ObjectFilePDB>(m_objfile_sp)) {
> > > > >   file_up = pdb->giveMeAPDBFile(); 
> > > > >   // PS: giveMeAPDBFile is a new pdb-specific public method on ObjectFilePDB
> > > > >   // PS: possibly this should not return a PDBFile, but a PDBIndex directly -- I don't know the details but the general idea is that it could/should reuse the pdb parsing state that the objectfile class has already created
> > > > > } else 
> > > > >   file_up = do_the_loadMatchingPDBFile_fallback();
> > > > > ```
> > > > > 
> > > > > The reason for that is that it avoids opening the pdb file twice (once in the object file and once here). Another reason is that I believe the entire `loadMatchingPDBFile` function should disappear. Finding the pdb file should not be the job of the symbol file class -- we have symbol vendors for that. However, we should keep the fallback for now to keep the patch small...
> > > > From what I see, it seems like PDBFile is designed to be created when opening the pdb file. It doesn't allow copying it around.
> > > > `NativeSession` holds a unique_ptr of PDBFile and `NativeSession::getPDBFile` returns the reference of object, so creating unique_ptr from it is not allowed.
> > > > 
> > > > I moved `loadPDBFile` to `ObjectFilePDB` as static function, so we can move `PDBFile` easily.
> > > You don't need to copy it -- you can just have the two objects reference the same instance (stealing the PDBFile object from under ObjectFilePDB seems suboptimal). Probably the simplest way to achieve that is to change the PdbIndex class to use a raw pointer (and ObjectFilePDB to provide one). Then SymbolFilePDB can have an additional unique_ptr<PDBFile> member, which may or may not be empty (depending on whether it fetches object from ObjectFilePDB or loads it itself). Eventually, as we transition to always fetching the PDBFile from ObjectFilePDB, that member will disappear completely.
> > > 
> > > Also, I wasn't expecting you to take the `giveMeAPDBFile` name literally. :) I was deliberatly using a funky name as a placeholder because I wasn't sure of how exactly this would work -- I was trying to illustrate the general concept, and not a precise implementation. If what, I said in the previous paragraph works, then a "normal" name like `GetPDBFile` should work just fine...
> > > 
> > > Sorry about the confusion.
> > > Then SymbolFilePDB can have an additional unique_ptr<PDBFile> member
> > I think you meant `SymbolFileNativePDB`, but why do we need this? It already has a member `unique_ptr<PDBIndex>`. I might misunderstood. 
> > 
> > I changed it to have `loadPDBFile` return raw pointer and `ObjectFilePDB` has a raw pointer member and PdbIndex uses raw pointer.
> Yeah, but who's now responsible for deleting the PDBFile object? You just leak it, right?
> 
> My proposal was the following:
> - ObjectFilePDB is responsible for freeing the PDBFile object that it creates (but it can borrow that object to the SymbolFile)
> - SymbolFileNativePDB is responsible for the object that it creates (which happens only when it's not borrowing the PDBFile). This is what the extra member is for.
> - PDBIndex is *not* responsible for freeing anything
> 
> The imagined implementation was something like:
> ```
> class ObjectFilePDB {
>   std::unique_ptr<PDBFile> m_file_up; // Always non-null
>   PDBFile &GetPDBFile() { return *m_file_up; }
> };
> class SymbolFilePDB {
>   std::unique_ptr<PDBIndex> m_pdb_index;
>   std::unique_ptr<PDBFile> m_file_up; // May be null
> };
> SymbolFilePDB::CalculateAbilities() {
>   ...
>   PDBFile *pdb_file;
>   if (loading_from_objfile)
>     pdb_file = &objfile->GetPDBFile();
>   else {
>     m_file_up = loadMatchingPDBFile(...);
>     pdb_file = m_file_up.get();
>   }
>   m_index = PdbIndex::Create(*pdb_file);
> }
> ```
> 
> Other implementations are possible as well...
Thanks for declaration.

================
Comment at: lldb/test/Shell/ObjectFile/PDB/object.test:3
+# RUN: yaml2obj %p/Inputs/pdb.yaml -o %t.obj
+# RUN: lld-link %t.obj /debug /pdb:%t.pdb /entry:main /nodefaultlib
+# RUN: lldb-test object-file %t.pdb | FileCheck %s
----------------
labath wrote:
> zequanwu wrote:
> > labath wrote:
> > > Could we generate the pdb directly from yaml (perhaps via `llvm-pdbutil yaml2pdb`)? That way we could have a hardcoded UUID and explicit test for its parsing.
> > I noticed when I use `llvm-pdbutil pdb2yaml` to generate yaml file from pdb file and then use `llvm-pdbutil yaml2pdb` to get pdb file back, the two pdb files are not the same. Also using `lldb-test symbols` with the pdb created from yaml file, got the error: `error: Module has no symbol vendor.` This doesn't happen if I use the original pdb file.
> > 
> Yeah, unfortunately it looks like pdb2yaml does not support some pdb streams (though I also wonder if lldb is not being too strict about the kinds of streams it expects). It would be nice to fix that one day.
> 
> Nonetheless, the current pdb2yaml functionality seems sufficient for the purposes of this particular test, so I think we should use that here. The hardcoded uuid here may be brittle as it depends on the exact algorithm that lld uses to compute it.
I switched to use yaml2pdb. The uuid is swapped byte order when reading from pdb.

================
Comment at: lldb/test/Shell/ObjectFile/PDB/symbol.test:4
+# RUN: lld-link %t.obj /debug /pdb:%t.pdb /entry:main /nodefaultlib
+# RUN: lldb-test symbols %t.pdb | FileCheck %s
+
----------------
labath wrote:
> zequanwu wrote:
> > labath wrote:
> > > This is SymbolFile functionality, so it'd be better placed in `Shell/SymbolFile/PDB`
> > I put it here because I saw `symbol.yaml` which uses `lldb-test symbols` exists in `Shell/ObjectFile/PECOFF`. And the folder `Shell/SymbolFile` seems to contain tests for SymbolFile plugins.
> The fact that that test uses the `symbols` function of lldb-test is not ideal, but the functionality it tests (symbol table parsing) is definitely an ObjectFile feature. OTOH, Parsing of Types, CompileUnits and Functions definitely falls into the SymbolFile purview. Given that the other test can be moved to yaml2pdb, and you don't have to worry about duplicating inputs, I'd move this test to the SymbolFile folder
> 
> That said, I'm not sure what is it that you actually wanted to test here. What's being done here that is not covered by the `load-pdb.test`? You could potentially add -o "b main" (to force symfile parsing) and -o "image dump symfile" to the other test if you wanted to check that the pdb can actually be parsed this way.
I'll put the testing of symfile parsing in `load-pdb.cpp`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89812/new/

https://reviews.llvm.org/D89812