[llvm] r203821 - MCDwarf: Refactor line table handling into a single data structure

Fri Mar 14 15:21:43 PDT 2014

On Mar 14, 2014, at 1:56 PM, David Blaikie <dblaikie at gmail.com> wrote:

> (Sorry Greg (& everyone else) - didn't mean to drop the conversation to private)
> 
> On Fri, Mar 14, 2014 at 1:41 PM, Greg Clayton <gclayton at apple.com> wrote:
>> 
>> On Mar 14, 2014, at 1:29 PM, David Blaikie <dblaikie at gmail.com> wrote:
>> 
>>> On Fri, Mar 14, 2014 at 1:24 PM, Greg Clayton <gclayton at apple.com> wrote:
>>>> Yes we can make this work however the compiler chooses to output things and I know things are harder in the assembler world as you only have simple directives.
>>> 
>>> Right - but more than that it sounds like as long as we generate
>>> ranges (debug_aranges or debug_ranges) there's no benefit to lldb to
>>> produce separate line tables per CU. If that's the case, it doesn't
>>> seem worth the hassle to maintain/enhance the multi-line-table
>>> codepath when we already support and emit ranges.
>> 
>> You recently emit ranges in a different way (DW_AT_ranges on the DW_TAG_compile _unit are new),
> 
> aranges can be enabled by a flag if that's the preference for your
> platform, I believe

Yes, we could flip the flag for Darwin, but I hope to minimize these differences as much as possible.

> 
>> and yes we will need to modify LLDB to deal with this as well. LLDB currently will work, but it requires changes to make it efficient again.
> 
> Presumably LLDB will be better off (ie: faster) using the pre-made
> dwarf ranges rather than trying to build the range from the line table
> anyway - yes?

Yes. This change will be very easy. I mostly worry about the other DWARF parsers we have in house (dtrace and CoreSymbolication to name as least 2), and also any other debuggers out there in the world (besides GDB).

> 
>> Right now LLDB assumes the line table entries are per compile unit and we aren't handing out line tables that are shared as that is how all compiles to this date have generated things. The line table will currently be duplicated for each compile unit. LLDB can be made to work efficiently with one line table for all CU's, it will just take a few hours of work.
>> 
>>> 
>>>> The DWARF spec doesn't state anything along the lines that only statements for a compile unit must exist within a compile unit, so what you guys want to generate is "legal".
>>>> 
>>>> If we can limit this kind of info only to the cases where we can't do it any other way (like only in the assembler), I would prefer that just from an organizational standpoint.
>>> 
>>> That's what we do today, but it's not where we prefer to be. It's
>>> strongly desired that what we output when performing direct object
>>> emission is equivalent to what we output when generating assembly.
>>> We're talking about how to get us back from the currently broken state
>>> of having a major difference there to not having that difference -
>>> either by emitting one line table or getting an extension to the
>>> assembly directives to support multiple line tables in the
>>> assembler(s) (or possibly other solutions - like emitting line tables
>>> directly rather than using directives, which means pre-relaxing asm
>>> output... which seems a bit extreme).
>> 
>> I would rather have them correctly organized
> 
> Again - correctness is a tad subjective here. Though, I agree, the
> spec doesn't exactly imply that line tables can/should be shared.

Yep. My point is historically is has been one way and we know that all of the DWARF parsers out there might have issues with any of these changes, so history == what people expect == how the DWARF parser was built using the spec and the translations people made from the spec, and I can't answer that one easily.

> 
>> and not say "since this is hard in the assembler, we changed the normal way line tables are emitted to make them the same as the assembler".
> 
> I'm not sure it's even a matter of "hard" - it's a change to not just
> the LLVM assembler, but gas as well - which seems a bit extreme for
> something where the data is already provided in a more convenient form
> for the task its being used for.
> 
> We can do it (add an extension to the assembly syntax) - it just seems
> hard to justify that scope of change given the marginal benefit (given
> there's already a DWARF feature that exposes the information
> required).

Gotcha.

> 
>> Granted I don't understand the intricate nature of the issues you are running into in the compiler or assembler, I am just looking at it from the consumer of debug info that has been emitted in a certain way until recently.
> 
> It's not so intricate as just dealing with/paying off the debt off the
> violation of the "assembly and object output are equivalent" ethos
> that occurred when this feature of LTO line tables were implemented.
> 
> The choice seems to be:
> 
> 1) change the assembler to support a more expressive syntax for
> .loc/.file directives that include a disambiguator (and a new
> directive to refer to a specific line table by that disambiguator)
> or
> 2) change the debugger to use an efficient representation of ranges
> ready-prepared in the DWARF
> 
> The latter seems like the better option to me (because I assume it'll
> be better for LLDB anyway - rather than building the ranges from the
> line table), but it's quite possible that I misunderstand the nature
> of the change to LLDB.

I don't really worry about LLDB, I mostly worry about others consuming the DWARF. 

The modifications to LLDB will take at most an hour or two and LLDB will be back to near peak efficiency, so I am not against these changes, they will just take some working.

> 
>> But do what you will and we will modify LLDB to deal with what is generated.
> 
> Sure - I'm (I think we all are) trying to weigh the costs (as
> objectively as I can) by understanding the impact to LLDB, though.

And thanks for thinking about this for us consumers!

After reading the spec a bit more I found something that is possibly troublesome if we do emit one line table for multiple compile units:

"The primary source file is described by an entry whose path name exactly matches that given in the DW_AT_name attribute in the compilation unit, and whose directory is understood to be given by the implicit entry with index 0." 

The "directory index 0" could mean different things for each compile unit and you would need to make sure no line table entries use a file index whose directory index is 0.

This was in the "file_names" description for the line table prologue.

The state machine for the line table is initialized as:

At the beginning of each sequence within a line number program, the state of the registers is:

address         0
op_index        0
file            1
line            1
column          0
is_stmt determined by default_is_stmt in the line number program header 
basic_block 	false
end_sequence 	false
prologue_end 	false
epilogue_begin  false
isa             0
discriminator   0

So the "file" being 1 usually means that this is the main source file for the line table, though there aren't explicit rules about that in DWARF, some DWARF parsers might make that assumption and assume any entry with a "file" index that is not 1 is an inlined line entry.

Also when setting file and line breakpoints, someone is going to type in "main.c" line 12. Now many debuggers might iterate across all line tables they have and find all line entries whose file and line matches and you might end up with some debuggers finding multiple locations. LLDB will properly reduce this to a single location (since the address is the same), but others might not. Debuggers would also need to know to say "hey, I found 'main.c:12' in my line table, but now I need to know to check my address ranges for my compile unit to know that this line entry doesn't really belong to me". I am not sure how many debuggers or symbolicators will know to do that, so they might attribute the 'main.c:12' to a compile unit 'foo.c' because there is where it was found in the DWARF.

Just a few things to think about if we go this route. We can and will fix LLDB to do what is right, I just worry about other debuggers and DWARF consumers that might break as a result.

Greg