[LLVMdev] Minimizing -gmlt

David Blaikie dblaikie at gmail.com
Thu Aug 28 13:51:02 PDT 2014


On Thu, Aug 28, 2014 at 11:51 AM, Alexey Samsonov <vonosmas at gmail.com>
wrote:

> This sounds great. Teaching backend about the -gmlt might help us in
> another way: we might enforce full debug info generation in the frontend
> for -fsanitize= flags, then rely on some parts of this debug info in
> instrumentation passes and prune it before the actual object file
> generation. This would be somewhat similar to what -Rpass does, only it
> kills all the debug info, while we would need to turn full debug info into
> gmlt-like.
>

Yep, this crossed my mind (removing most of the extra codepaths from Clang
would be nice) but I figured we'd probably keep it this way for now, since
it reduces the amount of metadata we have to build when we don't need it.

But if sanitizers end up needing more of that information for whatever
reason (while not wanting to emit more debug info) this will provide a
basis for such a state of affairs in the future.


> Anyway, to backtracing:
>
> On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>> In an effort to fix inlined information for backtraces under DWARF
>> Fission in the absence of the split DWARF (.dwo) files, I'm planning on
>> adding -gmlt-like data to the .o file, alongside the skeleton CU.
>>
>> Since that will involve teaching the LLVM about -gmlt (moreso than it
>> already has - the debug info LLVM metadata already describes -gmlt for the
>> purposes of omitting pubnames in that case) I figured I'd take the
>> opportunity to move the existing -gmlt functionality to the backend to
>> begin with, and, in doing so, minimize it a little further since we
>> wouldn't need to emit debug info for every function - possibly just those
>> that have functions inlined into them.
>>
>
> Right. Currently, if the symbolizer is unable to find a subprogram DIE
> corresponding to a PC, it tries to at least fetch the file/line info from
> the line table, and assumes that function name might be available in the
> symbol table.
>
>>
>> So here's an example of some of my ideas about minimized debug info. I'm
>> wondering if I'm right about what's needed for backtracing.
>>
>> I've removed uninteresting things, like DW_AT_accessibility (which is a
>> bug anyway), DW_AT_external (there's no reason symbolication needs that, is
>> there?), but also less obviously uninteresting things like DW_AT_frame_base
>> (the location of the frame pointer - is that needed for symbolication?)
>>
>
> We don't use DW_AT_accessibility and DW_AT_external.
>

Great


> As Chandler suggests, DW_AT_frame_base might be required for unwinders,
> but I don't really know that.
>

>
>>
>> Also I've made a frontend (for now) change (see mgmlt_clang.diff) to omit
>> the data that causes DW_AT_decl_file/DW_AT_decl_line to be emitted - are
>> those needed? I don't think so.
>>
>
> We don't use them.
>

Excellent


>
>
>>
>> But importantly: the only DW_TAG_subprograms are either functions that
>> have been inlined, or functions that have been inlined into. Is that enough?
>>
>> Is it OK that I haven't included debug info for out of line definitions
>> of inline functions?
>>
>> I'm assuming all that information can be retrieved from the symbol table.
>>
>
>
> See above. Looks like this information is not necessary.
>

Perfect.


>
>
>>
>> (one other thing I noticed is that we don't use the mangled names for
>> functions in -gmlt - how on earth does that work?
>>
>
> Yeah, IIRC currently -gmlt doesn't produce DW_AT_linkage_name entries,
> only DW_AT_name (DW_AT_linkage_name signifincantly increases the binary
> size for heavily templated code). So, instead of Foo::Bar<double>::Baz we
> have only "Baz". And we live with that - we fetch just "Baz" from
> subprogram entries. If a function is not inlined, then we're able to fetch
> its fully-qualified name from the symbol table, if it is inlined and
> there's no symbol table entry - fine then, we print just the short name.
> Generally this is enough for readable stack traces, as we still have
> file/line info (stored in DW_AT_call_file / DW_AT_call_line). The function
> names fetched from DW_AT_linkange_name and/or symbol table are demangled
> with a call to __cxa_demangle (we assume that it's just available on the
> system, and 95% we are right).
>

OK - if that's the tradeoff you guys have made, I'm happy not to meddle
with it.

(did you do a comparison with compression enabled for the strings section?
At Google I know we don't compress the linked debug info, but we could -
this might help in general, and make it not so costly to go from short
names to fully mangled names)


>
>
>> The backtrace would look really strange if it included the unmangled
>> names of functions - or does the symbolizer use the address range of the
>> out of line definition (if there is one?) of the inlined function (in which
>> case I'd need to provide it... ) to find it in the symbol table, get the
>> mangled name, and use that?)
>>
>> One thing I was thinking of doing as well, is that since the
>> DW_AT_abstract_origin just points to a trivial subprogram with a name and
>> DW_AT_inline - perhaps instead of an abstract origin, we could just use
>> DW_AT_name directly? (with the mangled name, probably) That'd save us
>> emitting the extra indirection and the name is uniqued already anyway. (and
>> DW_FORM_strp is the same size as DW_FORM_ref4 (though DW_FORM_strp would
>> mean extra relocations...) - and perhaps in the near future, DW_FORM_strp
>> could be replaced by DW_FORM_str_index to reduce relocations)
>>
>
> Yes, this might work. Generally, when we find a
> subprogram/inlined_subroutine DIE we calculate its name by following the
> DW_AT_specification/DW_AT_abstract_origin links until we find a DIE with
> DW_AT_name provided. If we're able to get the name directly things will
> only be better.
>

So long as you look for the name on the inlined_subroutine first, before
walking DW_AT_specification/DW_AT_abstract_origin links, that'll work
perfectly if/when we do this.

(might have to teach it about DW_FORM_str_index, at some point, though)


>
>
>>
>> So... yes/no/maybe?
>>
>
> Speaking of testing, we have some nontrivial amount of sanitizer tests in
> compiler-rt that match the expected symbolized stack trace. Currently the
> sources are built with "-g", but I think we can detect if the compiler we
> test supports -gmlt and/or fission and use the strictest debug info flag
> settings we still want to provide nice reports for.
>

Right, that sounds like a thing to do - I'd rather not make my changes
until we've got that in place (& once it's in place I'll try a few obvious
"break this and see if the tests fail" sort of things to check that my
changes are being properly validated).

Can you let me know if you need help/want me to do that work (not that I'm
terribly well versed in CMake, but I guess that's true of most of us)
and/or when it's done and then I'll see about getting this work committed
and moving onto the gmlt-esque+fission stuff.

(side note, just to write it down: the gmlt+fission part of this (after
this patch that minimizes gmlt by using backend knowledge) will require a
fair bit of refactoring, but it'll be good to have the minimized-gmlt work
in first and actively tested so I have that as a good baseline that my
refactorings are making sense)


>
>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
>
> --
> Alexey Samsonov
> vonosmas at gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140828/6f910863/attachment.html>


More information about the llvm-dev mailing list