[lldb-dev] Making a new symbol provider

Thu Feb 11 17:35:16 PST 2016

> On Feb 11, 2016, at 3:41 PM, Zachary Turner via lldb-dev <lldb-dev at lists.llvm.org> wrote:
> 
> Hi,
> 
> I want to make a new symbol provider to teach LLDB to understand microsoft PDB files.  I've been looking over the various symbol APIs, and I have a few questions.  
> 
> 1. Under what circumstances do I need a custom SymbolVendor?  The way pdb works is that generally there is 1 file that contains all the debug info needed for a single binary (so or executable).  Given a list of paths, we can then determine if there is a matching PDB in one of those paths.  Is it better to do this in the CalculateAbilities() function of the symbol file plugin (by just returning 0 if we don't find a match) or do we need to do something more complicated?

I would suggest make a SymbolVendorPDB that only enables itself if you are able to find the PDB files for your COFF file. So look at your COFF file, and I presume somewhere in there there is a pointer to one or more PDB files inside that file? CalculateAbililties is the correct place to see if a COFF file has pointers to PDB files and making sure those files exist before you say that you can provide any abilities.

> 
> 2. Why is there a function called ParseCompileUnitLanguage?  The CompileUnit class already stores the language when ParseCompileUnit is called, and ParseCompileUnitLanguage is implemented by just getting that value out.  What is the poitn of this function?

If we are constructing CompileUnit instances with a valid language, we will never need to call the ParseCompileUnitLanguage function on SymbolVendor/SymbolFile, but it it is eLanguageTypeInvalid, we will lazily populate this later.

> 
> 3. There's a function called ParseCompileUnitDebugMacros.  Is this referring to C / C++ macros?  Like #define FOO 7?  What is that used for?  I don't believe info about preprocessor definitions are stored in PDB.  Is this going to cause problems?

Nope, just don't implement. Hopefully there is a default implementation that does nothing. We should imply that by having a default implementation for this that there is nothing wrong with not filling it in.

> 
> 4. ParseCompileUnitSupportFiles.  What are "support files"?  Given a file "foo.cpp" is this supposed to be header files etc?

This is largely mirroring how DWARF structures its data, but in general a compile unit might have files that it uses for line tables and decl file for things like variables. 

So any files in your line table should be in here. In DWARF the line tables use file indexes in their line tables to save space. Also any DWARF info that says "I am declared on line 12 of file 'Foo.c'" will use an index to refer to 'Foo.c'. We use the compile unit support files for this:

    case DW_AT_decl_file:   
	decl.SetFile(sc.comp_unit->GetSupportFiles().GetFileSpecAtIndex(file_idx));
	break;

The macro support you mention above also uses file indexes when referring to files.

So the support files should be a list of files that make sense to your PDB parser in case your PDB uses file indexes when referring to files. Since LLDB uses a partial parsing style of debug info, we only expand debug info into agnostic LLDB info lazily as the information is needed. All symbol files also get to pick their own identifiers for everything. For DWARF, we use the DIE offset as the identifier. So say you parse  DWARF that looks like:

0x0000000b: TAG_compile_unit [1] *
             AT_producer( "Apple LLVM version 7.0.0 (clang-700.1.72)" )
             AT_language( DW_LANG_C99 )
             AT_name( "main.c" )
             AT_stmt_list( 0x00000000 )
             AT_comp_dir( "/Volumes/work/gclayton/Documents/src/args" )
             AT_low_pc( 0x0000000100000cf0 )
             AT_high_pc( 0x0000000100000e9b )

0x0000002e:     TAG_subprogram [2] *
                 AT_low_pc( 0x0000000100000cf0 )
                 AT_high_pc( 0x0000000100000e9b )
                 AT_frame_base( rbp )
                 AT_name( "main" )
                 AT_decl_file( "main.c" )
                 AT_decl_line( 9 )
                 AT_prototyped( 0x01 )
                 AT_type( {0x000000c6} ( int ) )
                 AT_external( 0x01 )

0x0000004d:         TAG_formal_parameter [3]  
                     AT_location( fbreg -1048 )
                     AT_name( "argc" )
                     AT_decl_file( "main.c" )
                     AT_decl_line( 9 )
                     AT_type( {0x000000c6} ( int ) )

0x0000005c:         TAG_formal_parameter [3]  
                     AT_location( fbreg -1056 )
                     AT_name( "argv" )
                     AT_decl_file( "main.c" )
                     AT_decl_line( 9 )
                     AT_type( {0x000000cd} ( const char** ) )

The ID of the compile unit is 0x0000000b since that is the DIE offset for the compile unit. If we ask the compile unit any questions through the lldb_private::CompileUnit, we can always extract the ID from the compile unit so we know how to dig up the original DWARF info so we can parse more info lazily and only as needed.

Likewise, the TAG_subprogram represents a function. We might parse only the function "main" at 0x0000002e, and then later be asked to parse the blocks and variables inside of it. If we use 0x0000002e for the ID of the function, we can quickly find the DWARF for it and parse its child variables and blocks. 

So be sure to pick identifiers that make sense for PDB. Hopefully this will be easy.

> 

> 5. ParseCompileUnitLineTable.  On the LineTable class you can add "line sequences" or individual entries.  What's the difference here?  Is there any disadvantage to adding every single line entry in the line table using the InsertLineEntry instead of building a line sequence and inserting the sequence?

The rule follows DWARF line tables: line sequences must be an array of line entries whose addresses are always increasing. You can add every line in sequence as long as the line entries are in increasing address order. We are going to sort the line entries into an array that is sorted for quick lookups. 

> 
> I will probably have some more questions as I continue down this path.  For now I'm planning to implement the minimum amount of functionality required just to make LLDB locate and open a PDB for an executable without actually returning anything useful from it.  So when I start filling out types, functions, etc I may have some more questions.

I am the person you will need to ask as I implemented everything in the symbols so far. If you have any questions, feel free to ask and I will get back to your as quickly as I can. If I am not around, you can take a look at the DWARF spec, or talk to someone that is familiar with DWARF, and you can probably bet we are very similar to DWARF in many respects since it is a very powerful and complete format. 

Let me know what questions you have! I look forward to seeing the PDB plug-in make it into LLDB.

Greg Clayton