[llvm] r203821 - MCDwarf: Refactor line table handling into a single data structure

Thu Mar 20 12:56:49 PDT 2014

On Mar 20, 2014, at 12:00 PM, David Blaikie <dblaikie at gmail.com> wrote:

>> LLDB will properly reduce this to a single location (since the address is the same), but others might not. Debuggers would also need to know to say "hey, I found 'main.c:12' in my line table, but now I need to know to check my address ranges for my compile unit to know that this line entry doesn't really belong to me". I am not sure how many debuggers or symbolicators will know to do that, so they might attribute the 'main.c:12' to a compile unit 'foo.c' because there is where it was found in the DWARF.
> 
> Hmm - I don't really understand why the debugger would need to ascribe
> this location to a particular CU. Could you explain the reasoning
> there? (if someone says "break main.c:12" they'd break on all
> instances of that in all CUs, presumably)

Most debuggers, including GDB, will make some sort of class that backs a compile unit. Then when looking up a file + line, it will start with the first of the compile unit classes and ask that compile unit if it contains the specified file and line. So if we have the following DWARF:

compile unit 0: foo.c (with shared line table)
compile unit 1: bar.c (with shared line table)
compile unit 2: main.c (with shared line table)
compile unit 3: baz.c (with shared line table)

We might find 4 duplicate lines for main.c:12 in all compile units, and then limit it down to 1 location. Since "foo.c" is the first compile unit, the debugger might just say that the line table entry that won the unique location war, is the one who found it first. 

So it might say that the debugger class for the compile unit "foo.c" is the compile unit for "main.c:12". In LLDB we have various ways to resolve addresses, but generally an address should resolve to a single "symbol context". In LLDB we have classes for executables and shared libraries (Module), a compile unit within that module (CompileUnit), a function in the compile unit (Function) and a lexical block within the function (Block), so our address lookups will result in a struct that contains:

SymbolContext {
  Module *module;
  CompileUnit *comp_unit;
  Function *function;
  Block *block;
  LineEntry *line_entry;
}

So the SymbolContext for the "main.c:12" could end up having a "comp_unit" which says "foo.c". Again, we can modify LLDB to deal with the shared line tables and "do the right thing".

> 
>> Just a few things to think about if we go this route. We can and will fix LLDB to do what is right, I just worry about other debuggers and DWARF consumers that might break as a result.
> 
> Yeah - reading over the spec, and while I was able to workaround the
> comp_dir issue, I am a little less satisfied with the single-table
> solution. Maybe we'll have a chat with the GCC (an off-hand comment
> seemed to indicate that they probably do the same thing we do (except
> they'd do it in all cases, since they have no integrated assembler)
> and share a line table under LTO) and GDB folks and see if they think
> the gas extension would be useful/appropriate/good - with both
> compilers behind that I'd be more comfortable that we'll get the
> assembly syntax extension correct and consistent between GNU as and
> LLVM's integrated assembler (or that we consistently come to some
> other conclusion, including shared line tables, and work to get that
> clarified in DWARF if need be, but at least between this handful of
> producers and consumers).

Sounds good, let me know what you come up with after you have spoken to the GCC/GNU folks.

Greg