[llvm] r203821 - MCDwarf: Refactor line table handling into a single data structure

Thu Mar 20 12:00:01 PDT 2014

>>> You recently emit ranges in a different way (DW_AT_ranges on the DW_TAG_compile _unit are new),
>>
>> aranges can be enabled by a flag if that's the preference for your
>> platform, I believe
>
> Yes, we could flip the flag for Darwin, but I hope to minimize these differences as much as possible.
>
>>> and yes we will need to modify LLDB to deal with this as well. LLDB currently will work, but it requires changes to make it efficient again.
>>
>> Presumably LLDB will be better off (ie: faster) using the pre-made
>> dwarf ranges rather than trying to build the range from the line table
>> anyway - yes?
>
> Yes. This change will be very easy. I mostly worry about the other DWARF parsers we have in house (dtrace and CoreSymbolication to name as least 2), and also any other debuggers out there in the world (besides GDB).

Sure enough - hard to say what other tools might be doing/assuming
about this information.

> My point is historically is has been one way and we know that all of the DWARF parsers out there might have issues with any of these changes, so history == what people expect == how the DWARF parser was built using the spec and the translations people made from the spec, and I can't answer that one easily.

Well we've had this state for a while (if the integrated assembler is
turned off) and so would, I imagine, GCC and any other producer that
uses GNU assembly and LTO. But, yes, that's not exactly pervasive to
have reached all consumers.

>>> But do what you will and we will modify LLDB to deal with what is generated.
>>
>> Sure - I'm (I think we all are) trying to weigh the costs (as
>> objectively as I can) by understanding the impact to LLDB, though.
>
> And thanks for thinking about this for us consumers!
>
> After reading the spec a bit more I found something that is possibly troublesome if we do emit one line table for multiple compile units:
>
> "The primary source file is described by an entry whose path name exactly matches that given in the DW_AT_name attribute in the compilation unit, and whose directory is understood to be given by the implicit entry with index 0."

Fair point - yes, we'd violate the somewhat curious wording
("'understood' to be given by..." ) and the standard doesn't really
say much about what the "primary source file" is (why it's a concept
that needs to exist, etc).

> The "directory index 0" could mean different things for each compile unit and you would need to make sure no line table entries use a file index whose directory index is 0.

That's a really good point and LLVM was broken in this regard. I've
fixed that in (
http://llvm.org/viewvc/llvm-project?rev=204094&view=rev ) - thanks for
the catch!

> This was in the "file_names" description for the line table prologue.
>
> The state machine for the line table is initialized as:
>
> At the beginning of each sequence within a line number program, the state of the registers is:
>
> address         0
> op_index        0
> file            1
> line            1
> column          0
> is_stmt determined by default_is_stmt in the line number program header
> basic_block     false
> end_sequence    false
> prologue_end    false
> epilogue_begin  false
> isa             0
> discriminator   0
>
>
> So the "file" being 1 usually means that this is the main source file for the line table, though there aren't explicit rules about that in DWARF, some DWARF parsers might make that assumption and assume any entry with a "file" index that is not 1 is an inlined line entry.

They could also be functions defined in headers - but, yes, I see your
general concern (& can't really judge how important it is) that tools
might make all sorts of assumptions based on the structure. While I'm
generally understanding to the "DWARF is whatever consumers/producers
agree upon", at some point we do have to take some latitude or we'll
be really restricted in the way we improve/modify the system.

> Also when setting file and line breakpoints, someone is going to type in "main.c" line 12. Now many debuggers might iterate across all line tables they have and find all line entries whose file and line matches and you might end up with some debuggers finding multiple locations.

Not sure I quite follow here - if there are multiple things on
main.c:12 (perhaps a more useful example would be an inline function
in a header - in which case (regardless of whether the function is
actually inlined into the callers, or left with linkonce-odr) there
may be multiple distinct line table entries once the code is linked
together) this would be true whether or not there's a shared line
table - there would be one in one line table and another in another
line table. It'd still be ambiguous.

But, yes, a DWARF consumer assuming line tables per CU would find
duplicate duplicates (instead of one from each CU, they'd find two in
the line table of each CU, because it was actually the same line
table)

> LLDB will properly reduce this to a single location (since the address is the same), but others might not. Debuggers would also need to know to say "hey, I found 'main.c:12' in my line table, but now I need to know to check my address ranges for my compile unit to know that this line entry doesn't really belong to me". I am not sure how many debuggers or symbolicators will know to do that, so they might attribute the 'main.c:12' to a compile unit 'foo.c' because there is where it was found in the DWARF.

Hmm - I don't really understand why the debugger would need to ascribe
this location to a particular CU. Could you explain the reasoning
there? (if someone says "break main.c:12" they'd break on all
instances of that in all CUs, presumably)

> Just a few things to think about if we go this route. We can and will fix LLDB to do what is right, I just worry about other debuggers and DWARF consumers that might break as a result.

Yeah - reading over the spec, and while I was able to workaround the
comp_dir issue, I am a little less satisfied with the single-table
solution. Maybe we'll have a chat with the GCC (an off-hand comment
seemed to indicate that they probably do the same thing we do (except
they'd do it in all cases, since they have no integrated assembler)
and share a line table under LTO) and GDB folks and see if they think
the gas extension would be useful/appropriate/good - with both
compilers behind that I'd be more comfortable that we'll get the
assembly syntax extension correct and consistent between GNU as and
LLVM's integrated assembler (or that we consistently come to some
other conclusion, including shared line tables, and work to get that
clarified in DWARF if need be, but at least between this handful of
producers and consumers).

Thanks,
- David