[LLVMdev] Minimizing -gmlt

Fri Aug 29 10:49:56 PDT 2014

On Thu, Aug 28, 2014 at 1:51 PM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
>
> On Thu, Aug 28, 2014 at 11:51 AM, Alexey Samsonov <vonosmas at gmail.com>
> wrote:
>
>> This sounds great. Teaching backend about the -gmlt might help us in
>> another way: we might enforce full debug info generation in the frontend
>> for -fsanitize= flags, then rely on some parts of this debug info in
>> instrumentation passes and prune it before the actual object file
>> generation. This would be somewhat similar to what -Rpass does, only it
>> kills all the debug info, while we would need to turn full debug info into
>> gmlt-like.
>>
>
> Yep, this crossed my mind (removing most of the extra codepaths from Clang
> would be nice) but I figured we'd probably keep it this way for now, since
> it reduces the amount of metadata we have to build when we don't need it.
>
> But if sanitizers end up needing more of that information for whatever
> reason (while not wanting to emit more debug info) this will provide a
> basis for such a state of affairs in the future.
>

Sounds good.

>
>
>> Anyway, to backtracing:
>>
>> On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>>
>>> In an effort to fix inlined information for backtraces under DWARF
>>> Fission in the absence of the split DWARF (.dwo) files, I'm planning on
>>> adding -gmlt-like data to the .o file, alongside the skeleton CU.
>>>
>>> Since that will involve teaching the LLVM about -gmlt (moreso than it
>>> already has - the debug info LLVM metadata already describes -gmlt for the
>>> purposes of omitting pubnames in that case) I figured I'd take the
>>> opportunity to move the existing -gmlt functionality to the backend to
>>> begin with, and, in doing so, minimize it a little further since we
>>> wouldn't need to emit debug info for every function - possibly just those
>>> that have functions inlined into them.
>>>
>>
>> Right. Currently, if the symbolizer is unable to find a subprogram DIE
>> corresponding to a PC, it tries to at least fetch the file/line info from
>> the line table, and assumes that function name might be available in the
>> symbol table.
>>
>>>
>>> So here's an example of some of my ideas about minimized debug info. I'm
>>> wondering if I'm right about what's needed for backtracing.
>>>
>>> I've removed uninteresting things, like DW_AT_accessibility (which is a
>>> bug anyway), DW_AT_external (there's no reason symbolication needs that, is
>>> there?), but also less obviously uninteresting things like DW_AT_frame_base
>>> (the location of the frame pointer - is that needed for symbolication?)
>>>
>>
>> We don't use DW_AT_accessibility and DW_AT_external.
>>
>
> Great
>
>
>> As Chandler suggests, DW_AT_frame_base might be required for unwinders,
>> but I don't really know that.
>>
>
>>
>>>
>>> Also I've made a frontend (for now) change (see mgmlt_clang.diff) to
>>> omit the data that causes DW_AT_decl_file/DW_AT_decl_line to be emitted -
>>> are those needed? I don't think so.
>>>
>>
>> We don't use them.
>>
>
> Excellent
>
>
>>
>>
>>>
>>> But importantly: the only DW_TAG_subprograms are either functions that
>>> have been inlined, or functions that have been inlined into. Is that enough?
>>>
>>> Is it OK that I haven't included debug info for out of line definitions
>>> of inline functions?
>>>
>>> I'm assuming all that information can be retrieved from the symbol table.
>>>
>>
>>
>> See above. Looks like this information is not necessary.
>>
>
> Perfect.
>
>
>>
>>
>>>
>>> (one other thing I noticed is that we don't use the mangled names for
>>> functions in -gmlt - how on earth does that work?
>>>
>>
>> Yeah, IIRC currently -gmlt doesn't produce DW_AT_linkage_name entries,
>> only DW_AT_name (DW_AT_linkage_name signifincantly increases the binary
>> size for heavily templated code). So, instead of Foo::Bar<double>::Baz we
>> have only "Baz". And we live with that - we fetch just "Baz" from
>> subprogram entries. If a function is not inlined, then we're able to fetch
>> its fully-qualified name from the symbol table, if it is inlined and
>> there's no symbol table entry - fine then, we print just the short name.
>> Generally this is enough for readable stack traces, as we still have
>> file/line info (stored in DW_AT_call_file / DW_AT_call_line). The function
>> names fetched from DW_AT_linkange_name and/or symbol table are demangled
>> with a call to __cxa_demangle (we assume that it's just available on the
>> system, and 95% we are right).
>>
>
> OK - if that's the tradeoff you guys have made, I'm happy not to meddle
> with it.
>
> (did you do a comparison with compression enabled for the strings section?
> At Google I know we don't compress the linked debug info, but we could -
> this might help in general, and make it not so costly to go from short
> names to fully mangled names)
>

In fact, for ASan builds we do use -Wl,--compress-debug-sections=zlib in
ASan builds... I haven't measured the difference linkage names would cause
for compressed sections, though. I don't remember any user complaints about
missing names for inlined functions, but, sure, we might want to add them
later.

>
>
>>
>>
>>> The backtrace would look really strange if it included the unmangled
>>> names of functions - or does the symbolizer use the address range of the
>>> out of line definition (if there is one?) of the inlined function (in which
>>> case I'd need to provide it... ) to find it in the symbol table, get the
>>> mangled name, and use that?)
>>>
>>> One thing I was thinking of doing as well, is that since the
>>> DW_AT_abstract_origin just points to a trivial subprogram with a name and
>>> DW_AT_inline - perhaps instead of an abstract origin, we could just use
>>> DW_AT_name directly? (with the mangled name, probably) That'd save us
>>> emitting the extra indirection and the name is uniqued already anyway. (and
>>> DW_FORM_strp is the same size as DW_FORM_ref4 (though DW_FORM_strp would
>>> mean extra relocations...) - and perhaps in the near future, DW_FORM_strp
>>> could be replaced by DW_FORM_str_index to reduce relocations)
>>>
>>
>> Yes, this might work. Generally, when we find a
>> subprogram/inlined_subroutine DIE we calculate its name by following the
>> DW_AT_specification/DW_AT_abstract_origin links until we find a DIE with
>> DW_AT_name provided. If we're able to get the name directly things will
>> only be better.
>>
>
> So long as you look for the name on the inlined_subroutine first, before
> walking DW_AT_specification/DW_AT_abstract_origin links, that'll work
> perfectly if/when we do this.
>
> (might have to teach it about DW_FORM_str_index, at some point, though)
>
>
>>
>>
>>>
>>> So... yes/no/maybe?
>>>
>>
>> Speaking of testing, we have some nontrivial amount of sanitizer tests in
>> compiler-rt that match the expected symbolized stack trace. Currently the
>> sources are built with "-g", but I think we can detect if the compiler we
>> test supports -gmlt and/or fission and use the strictest debug info flag
>> settings we still want to provide nice reports for.
>>
>
> Right, that sounds like a thing to do - I'd rather not make my changes
> until we've got that in place (& once it's in place I'll try a few obvious
> "break this and see if the tests fail" sort of things to check that my
> changes are being properly validated).
>

OK, I'll let you know once we use -gmlt for sanitizers' test suite.

>
> Can you let me know if you need help/want me to do that work (not that I'm
> terribly well versed in CMake, but I guess that's true of most of us)
> and/or when it's done and then I'll see about getting this work committed
> and moving onto the gmlt-esque+fission stuff.
>
> (side note, just to write it down: the gmlt+fission part of this (after
> this patch that minimizes gmlt by using backend knowledge) will require a
> fair bit of refactoring, but it'll be good to have the minimized-gmlt work
> in first and actively tested so I have that as a good baseline that my
> refactorings are making sense)
>
>
>>
>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>
>>
>> --
>> Alexey Samsonov
>> vonosmas at gmail.com
>>
>
>

-- 
Alexey Samsonov
vonosmas at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140829/1dbfc666/attachment.html>