[Lldb-commits] [PATCH] D17363: Add SymbolFilePDB with basic support for line tables.

Fri Feb 19 10:43:43 PST 2016

clayborg added a comment.

General things to know about SymbolFiles:

- We use SymbolContext objects to refer to a specific symbol context: module, compile unit, function, deepest block, line entry and symbol.
- A load address should produce 1 symbol context. Never more.
- A file address can produce many symbol contexts. Think about looking up address 0x1000. On MacOSX all shared libraries have their file virtual addresses such that they start at 0x0, so many functions shortly after zero, so address 0x1000 can produce many matches since file addresses aren't unique
- file + line can match many symbol contexts. Think about what the user would expect if they typed:  (lldb) b vector:123
- SymbolFile objects produce IDs for compile units, functions, variables, types, and many other things by creating these objects with a "lldb::user_id_t" that makes sense for the SymbolFile itself such that if they are given this ID back and asked to parse further information, they should quickly be able to use that number to get back to the debug info and parse more. For DWARF we use the DIE offset. The only thing that isn't correctly abstracted right now is the file IDs, they are currently expected to be indexes into the support files. This works well for DWARF, but probably not for other formats. We can change this if needed as I mentioned before.

================
Comment at: source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp:98-100
@@ +97,5 @@
+{
+    auto global = m_session->getGlobalScope();
+    auto compilands = global->findAllChildren<llvm::PDBSymbolCompiland>();
+    return compilands->getChildCount();
+}
----------------
zturner wrote:
> clayborg wrote:
> > Do you want to cache the compile unit count, or is this really cheap to call over and over?
> Most of these methods are really cheap.  In fact, the entire process of caching stuff in `SymbolContexts` and whatnot is probably unnecessary for us in theory because while it may be cheaper performance wise to cache it the way we're doing, it's not *that* expensive to just query the file, and when you consider the memory usage of caching it a second time, it's not clear it's a good tradeoff.
> 
> That said, there's not really a way around this with the current architecture, so I only cache where necessary (can always add more later if performance becomes an issue)
I was just talking about caching the number of compile units. You will be asked to create each individual lldb_private::CompileUnit by index with:

```
lldb::CompUnitSP
SymbolFilePDB::ParseCompileUnitAtIndex(uint32_t index) 
```

And you will be asked only once for a compile unit by index since the symbol vendor will cache those for you.

================
Comment at: source/Plugins/SymbolFile/PDB/SymbolFilePDB.cpp:257
@@ +256,3 @@
+        auto compilands =
+            m_session->findCompilandsForSourceFile(file_spec.GetPath(), llvm::PDB_NameSearchFlags::NS_CaseInsensitive);
+
----------------
> Just to make sure I understand what "support files" are.
> 
> If I have this file:
> 
> // foo.cpp
> #include "foo.h"
> #include <vector>
> int main(int argc, char **argv) {
>     return 0;
> }
> and I run clang.exe foo.cpp -o foo.exe, and someone calls ResolveSymbolContext("foo.cpp", ...), then in this scenario:
> 
> The compilation unit is foo.cpp
> The support files are foo.h, vector, and any other headers indirectly included by those headers.

The support files for DWARF will be "foo.cpp" as the first file (this is just the way DWARF does it) and foo.h vector and any other header files that were used will also be in there. So your support files will need to include your compile unit source file because you will create some line entries that point to your main source file.

> If check_inlines is false, the SymbolContextList should only have one entry for foo.cpp, and all the line table should contain all entries from foo.cpp that match the line number.

SymbolContextList would have N entries inside of it where the SymbolContextList.comp_unit is always a compile unit whose name matches the "file_spec". The LineEntry in each symbol context will also have a file index that matches the SupportFiles index for the "foo.cpp" source file.

> If check_inlines is true, the SymbolContextList should still only have one entry for foo.cpp, but the line table should contain all entries from foo.cpp that match *and* all entries from all support files (as defined in #2) that match.

No there will be one SymbolContext for each line entry that matches. All of the SymbolContext objects will have LineEntry whose file index will produce a file that is "foo.cpp" when the file is extracted by index from the compile units support files. 

So a quick example in main.cpp:

```
     1	#include <vector>
     2	
     3	int main()
     4	{
     5	    std::vector<int> ints;
     6	    ints.push_back(123);
     7	    ints.push_back(234);
     8	}
     9	
    10	int foo()
    11	{
    12	    std::vector<long> longs;
    13	    longs.push_back(123);
    14	    longs.push_back(234);
    15	}
```

If vector<T>::push_back is on vector:234, and the user asks to set a breakpoint:

```
(lldb) b vector:234
```

Lets says our support files look like:

```
support_files[0] = main.cpp
support_files[1] = /usr/include/.../vector
```

You would have 4 symbol contexts in your SymbolContextList:

```
SymbolContext[0]:
   module = a.out
   compile_unit = main.cpp
   function = main
   block = std::vector<int>::push_back() with call_file = 0 and call_line = 6 (main.cpp:6)
   line_entry = file_idx = 1, line = 234, address = 0x1000

SymbolContext[1]:
   module = a.out
   compile_unit = main.cpp
   function = main
   block = std::vector<int>::push_back() with call_file = 0 and call_line = 7 (main.cpp:7)
   line_entry = file_idx = 1, line = 234, address = 0x1010

SymbolContext[2]:
   module = a.out
   compile_unit = main.cpp
   function = foo
   block = std::vector<long>::push_back() with call_file = 0 and call_line = 13 (main.cpp:13)
   line_entry = file_idx = 1, line = 234, address = 0x2000

SymbolContext[3]:
   module = a.out
   compile_unit = main.cpp
   function = foo
   block = std::vector<long>::push_back() with call_file = 0 and call_line = 14 (main.cpp:14)
   line_entry = file_idx = 1, line = 234, address = 0x2010
```

Note the compile unit always is the compile unit that the code exists in. The block is a lldb_private::Block that has inlined function info that tells us about std::vector inlined instantiation and also tells us the inline call stack. In LLDB inlined functions are just lexical blocks that have inline function info (inlined function name + call file + call line). There could be many inlined blocks above your current block as well..

> Also, the code as written should already handle the case where only basename is specified (the SDK handles this case transparently)
> 
> (BTW, it would be nice if there were a way to write unit tests for all the things you're describing. All the different ways of calling it with different combinations of arguments, and verifying the output is as expected. It's a lot to wrap your head around trying to turn words into code, but if you have a test that just fails, then it's easy to look at it and see what it expects).

We should be covering this using API tests that make an example like the one above and setting breakpoints in the inline header file. We have many tests for this. It is hard to get compilers to generate things that you can put in unit tests since they often change. 

http://reviews.llvm.org/D17363