[lldb-dev] Making a new symbol provider

Greg Clayton via lldb-dev lldb-dev at lists.llvm.org
Fri Feb 12 09:41:13 PST 2016


> On Feb 11, 2016, at 6:56 PM, Zachary Turner <zturner at google.com> wrote:
> 
> 
> 
> On Thu, Feb 11, 2016 at 5:35 PM Greg Clayton <gclayton at apple.com> wrote:
> 
> > On Feb 11, 2016, at 3:41 PM, Zachary Turner via lldb-dev <lldb-dev at lists.llvm.org> wrote:
> >
> > Hi,
> >
> > I want to make a new symbol provider to teach LLDB to understand microsoft PDB files.  I've been looking over the various symbol APIs, and I have a few questions.
> >
> > 1. Under what circumstances do I need a custom SymbolVendor?  The way pdb works is that generally there is 1 file that contains all the debug info needed for a single binary (so or executable).  Given a list of paths, we can then determine if there is a matching PDB in one of those paths.  Is it better to do this in the CalculateAbilities() function of the symbol file plugin (by just returning 0 if we don't find a match) or do we need to do something more complicated?
> 
> I would suggest make a SymbolVendorPDB that only enables itself if you are able to find the PDB files for your COFF file. So look at your COFF file, and I presume somewhere in there there is a pointer to one or more PDB files inside that file? CalculateAbililties is the correct place to see if a COFF file has pointers to PDB files and making sure those files exist before you say that you can provide any abilities.
> Currently we use the operating system to query the PDBs.  This could change in the future, but for now that's how we're doing it.  The operating system does all the work of finding, matching, and loading the PDB for us, and it does it all in one call.  So if we put this in the symbol vendor, there's no way to say "is there a PDB" without also saying "actually load all the data from the PDB" at the same time.  So I'm not sure if there's a solution to this in there, because obviously I dont' want to load it twice.

Interesting. If you are on windows and you have a COFF file, you might just want to make a SymbolVendorCOFF. Does PDB info always and only get created for COFF files? 
> 
> One question I had about SymbolVendor, is that I looked at SymbolVendorELF.cpp and it seems to boil down to this notion of "symbol file representations".  All the logic in SymbolVendorELF exists just to add some object file representations.  What is this supposed to represent?  I've got an exe or something, what other "representation" is there other than the exe itself?

In SymbolVendoerMacOSX, we have the executable and then the DWARF debug info in a stand alone dSYM bundle. So MacOSX you have a.out as the main ObjectFile (a.out) for a Module, but the symbols are in a different ObjectFile (a.out.dSYM). For ELF I believe there is information in the ELF file that _might_ point to a separate debug info file, but it also might just contain the DWARF in the executable. So for ELF you have 1 file (exec ELF that contains DWARF) or two files (exe ELF with no DWARF + debug info ELF with DWARF).

A symbol vendor's only job is to take an executable and and then use it plus any other files (its job is to locate these extra debug files) to make a single coherent view of the symbols for a lldb_private::Module. So the SymbolVendor::FindTypes(...) might look into the executable file and one or more other files to get the information. The information must be retrieved from one or more SymbolFile instances. A SymbolFile uses one ObjectFile to do its job. So there is a one to one mapping between SymbolFile and ObjectFile instances. The SymbolFile can use the same ObjectFile as the main executable if the data is in there. The SymbolVendor is the one that figures this out.

So some mappings might help show. The addresses before the object names are the address of the class in the LLDB address space. For a simple a.out ELF file that contains DWARF we would have:

0x1000: Module ("/tmp/a.out")
          m_obj_file = 0x2000
0x2000: ObjectFile ("/tmp/a.out")
0x3000: SymbolVendorELF
          m_sym_file = 0x4000
0x4000: SymbolFile
          m_obj_file = 0x2000


For a a.out ELF file that contains an external debug file "/var/debug/a.out"

0x1000: Module ("/tmp/a.out")
          m_obj_file = 0x2000
0x2000: ObjectFile ("/tmp/a.out")
0x2200: ObjectFile ("/var/debug/a.out")
0x3000: SymbolVendorELF
          m_sym_file = 0x4000
0x4000: SymbolFile
          m_obj_file = 0x2200

Same goes for MacOSX where we have "a.out" and "a.out.dSYM" except the SymbolVendorMacOSX is used since it knows how to locate the dSYM files.

If there are multiple ObjectFile objects that represent the debug info, they must share the same section list. So ObjectFiles and SymbolFiles work to make a single section list within lldb_private::Module that is used for all objects used to represent the symbol and debug info. That way the ObjectFile at 0x2000 and 0x2200 above both use the same section for ".text", ".data", etc. If one ObjectFile has sections (like .debug_info for DWARF) where the other ObjectFile doesn't, then each ObjectFile adds sections as needed. Also if executable object file has no symbols, or a reduced amount of symbols, since it might have been stripped, the two ObjectFiles can combine their symbol tables to make a better symbol table. On MacOSX if we strip a.out and it has no symbols, we can get the symbols from the dSYM file (if we find one) since dSYM files always have fully unstripped symbol tables.

So think of SymbolVendor as the class that knows how to locate the symbol file for a given executable (possibly even fetch the symbols from your build system!!!) and put together one or more files to provide a coherent view of the debug info (grab debug info from the executable itself or a stand alone file) and object file (combine symbol tables from one or more object files, combine all sections from all ObjectFiles used for a Module/SymbolVendor) so the use doesn't ever need to worry about the underlying details, clients just ask the module for stuff and we provide it to them.

Greg


More information about the lldb-dev mailing list