[Lldb-commits] [PATCH] D53368: [Symbol] Search symbols with name and type in a symbol file

Thu Nov 29 14:39:11 PST 2018

> On Nov 29, 2018, at 2:02 PM, Zachary Turner via Phabricator <reviews at reviews.llvm.org> wrote:
> 
> zturner added a comment.
> 
> In D53368#1313238 <https://reviews.llvm.org/D53368#1313238>, @labath wrote:
> 
>> In D53368#1313145 <https://reviews.llvm.org/D53368#1313145>, @zturner wrote:
>> 
>>> In D53368#1313124 <https://reviews.llvm.org/D53368#1313124>, @labath wrote:
>>> 
>>>> I've recently started looking at adding a new symbol file format (breakpad symbols). While researching the best way to achieve that, I started comparing the operation of PDB and DWARF symbol files. I noticed a very important difference there, and I think that is the cause of our problems here. In the DWARF implementation, a symbol file is an overlay on top of an object file - it takes the data contained by the object file and presents it in a more structured way.
>>>> 
>>>> However, that is not the case with PDB (both implementations). These take the debug information from a completely different file, which is not backed by an ObjectFile instance, and then present that. Since the SymbolFile interface requires them to be backed by an object file, they both pretend they are backed by the original EXE file, but in reality the data comes from elsewhere.
>>> 
>>> 
>>> Don't DWARF DWP files work this way as well?  How is support for this implemented in LLDB?
>> 
>> 
>> There are some similarities, but DWP is a bit different. The main difference is that the DWP file is still an ELF (or whatever) file, so we still have a ObjectFile sitting below the symbol file. The other difference is that in case of DWP we still have a significant chunk of debug information present in the main executable (mainly various offsets that need to be applied to the unlinked debug info in the dwo/dwp files), so you can still very well say that the symbol file is reading information from the main executable. What DWARF does in this case is it creates a main SymbolFileDWARF for reading data from the main object file, and then a bunch of inner SymbolFileDWARFDwo/Dwp instances which read data from the other files. There are plenty of things to not like here as well, but at least this maintains the property that each symbol file sits on top of the object file from which it reads the data from. (and symtab doesn't go into the dwp file, so there are no issues with that).
>> 
>>>> I am asking this because now I am facing a choice in how to implement breakpad symbols. I could go the PDB way, and read the symbols without an intervening object file, or I could create an ObjectFileBreakpad and then (possibly) a SymbolFileBreakpad sitting on top of that.
>>> 
>>> What if `SymbolFile` interface provided a new method such as `GetSymtab()` while `ObjectFile` provides a method called `HasExternalSymtab()`.  When you call `ObjectFilePECOFF::GetSymtab()`, it could first check if `HasExternalSymtab()` is true, and if so it could call the SymbolFile plugin and return that
>> 
>> I don't think this would be good because there's no way for the PECOFF file to know if we will have a PDB file on top of it.
> 
> 
> I'm actually starting to wonder even if `GetSymtab()` should be part of `ObjectFile`.  The first thing it does is get the Module and then start calling a bunch of stuff on the Module interface.  Perhaps the place to start is comparing the Module and ObjectFile interfaces and seeing if the existing APIs make the most sense being moved up to Module.  If everything was on Module then the Module has everything it needs to go to the SymbolVendor and find a PDB file.

I would vote against moving anything into the module. Object files have their own symbol tables and we need the ability for an object file to be able to find a symbol that it created and we really don't want to abstract this away since at any time when we delve further into an object file we might need to dig up a symbol by its original symbol table index. So the cleanest design in my opinion is one where the object files can each have their own symbol table and the module uses the symbol vendor to get promote the best information up to the user. Symbols can come from one object file, or an external debug info object file, or from Breakpad. But each of those files should be able to have their own notion of their own symbols.

Greg

> 
> 
> CHANGES SINCE LAST ACTION
>  https://reviews.llvm.org/D53368/new/
> 
> https://reviews.llvm.org/D53368
> 
> 
>