[llvm] r203821 - MCDwarf: Refactor line table handling into a single data structure

Fri Mar 14 13:13:41 PDT 2014

On Fri, Mar 14, 2014 at 11:43 AM, Greg Clayton <gclayton at apple.com> wrote:
> Though having a single line table for everything in LTO might seem nice, it ruins the organization of how this information is designed to be consumed by debuggers.
>
> Consider DWARF with 3 compile units:
>
> (offsets below are in .debug_info )
> 0x00000000: /tmp/a.cpp
> 0x00000100: /tmp/b.cpp
> 0x00000200: /tmp/c.cpp
>
> And now we produce one line table with all lines from all files:
>
> 0x00000000: line table offset zero contains the following line table:
> 0x1000 /tmp/a.cpp line 10
> 0x1001 /tmp/a.cpp line 11
> 0x1002 /tmp/a.cpp line 12
> 0x1003 /tmp/a.cpp line 13
> 0x2000 /tmp/b.cpp line 10
> 0x2001 /tmp/b.cpp line 11
> 0x2002 /tmp/b.cpp line 12
> 0x2003 /tmp/b.cpp line 13
> 0x3000 /tmp/c.cpp line 10
> 0x3001 /tmp/c.cpp line 11
> 0x3002 /tmp/c.cpp line 12
> 0x3003 /tmp/c.cpp line 13
>
> Now when we parse the debug info for each compile unit we have:
>
> compile unit at 0x00000000 has a line table at .debug_line[0x00000000]
> compile unit at 0x00000100 has a line table at .debug_line[0x00000000]
> compile unit at 0x00000200 has a line table at .debug_line[0x00000000]
>
> So all three compile units claim to have a line table that contains entries that have nothing to do with the addresses within each compile unit?

Yes - the interpretation here is "DW_AT_stmt_list refers to a line
table that contains the information for this CU (and possibly other
stuff, but it has that much at least)" - this isn't an unprecedented
interpretation of debug info - indeed we do the same thing for
abbreviations and pubnames/pubtypes, perhaps other things too.

> I agree it can be done, but I also think it would be nice to have a single line table that applies only to the files in question.

Sure, it might be nice - but we're weighing up costs here. It's
important to have assembler parity - that what we output in object
files is the same as what we can output to assembly, and there's no
way to describe this in the assembly language we have today.

Extending the assembly language isn't something I'd try to do lightly,
I want to understand how valuable this is before we go and resort to
that.

> I addition, since there is this "line tables only" mode in clang these days, the only thing we have to go on (prior to Eric Christopher's patch that adds DW_AT_ranges to the DW_TAG_compile_unit) was the address ranges in the line table. So we currently have code that will take all the addresses in a line table (in line table only mode) and produce the address range for a compile unit (only when .debug_aranges isn't produced, which again is now disabled in top of tree).
>
> So I would vote to keep the debug info in as good of shape as possible and organize it as cleanly as possible.

"good" and clean are somewhat subjective here, still.

It sounds like this boils down to: "In the absence of ranges
(debug_aranges or debug_ranges) we can build ranges anyway, it's just
slow - and it's somewhat less slow (but not as good as having ranges)
if we have one line table per CU"

I don't think that really meets the bar here, for my own tastes anyway
- the answer seems to be: use/enable ranges. If you don't have them,
do the slow thing.

Is there any other way to weigh these subjective criteria in a more
concrete way that could lead us to a well founded conclusion?

- David

>
> On Mar 13, 2014, at 4:50 PM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
>
>>> It would produce a smaller assembly file, I can't see how it would
>>> produce smaller line tables though.
>>
>> Without it codegen will output a full address for each line. With .loc
>> MC can produce deltas.
>>
>> It would be possible to have an implementation similar to what we do
>> with .cfi: The Streamer interface always uses cfi inspired directives
>> and the asm streamer converts it (when cfi is disabled).
>>
>> In this case, it would mean that the streamer would get an call for
>> .file, .loc and ".offset_of_cu_in_debug_lines 0", but would process
>> that internally and print a .debug_line in the end. This would avoid
>> extending the assembly but:
>>
>> * We would produce uglier assembly.
>> * We would need to do a lot more in the asm streamer. In particular,
>> we would have to relax instructions to find their final size. We would
>> also have to make sure that instructions are printed in an completely
>> unambiguous way. If we think a branch is 16 bits and gas relaxation
>> produces a larger one, we have a problem.
>>
>> Cheers,
>> Rafael
>