[LLVMdev] Minimizing -gmlt

Fri Sep 5 15:16:36 PDT 2014

On Fri, Aug 29, 2014 at 10:49 AM, Alexey Samsonov <vonosmas at gmail.com>
wrote:

>
>
>
> On Thu, Aug 28, 2014 at 1:51 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>>
>> On Thu, Aug 28, 2014 at 11:51 AM, Alexey Samsonov <vonosmas at gmail.com>
>> wrote:
>>
>>> This sounds great. Teaching backend about the -gmlt might help us in
>>> another way: we might enforce full debug info generation in the frontend
>>> for -fsanitize= flags, then rely on some parts of this debug info in
>>> instrumentation passes and prune it before the actual object file
>>> generation. This would be somewhat similar to what -Rpass does, only it
>>> kills all the debug info, while we would need to turn full debug info into
>>> gmlt-like.
>>>
>>
>> Yep, this crossed my mind (removing most of the extra codepaths from
>> Clang would be nice) but I figured we'd probably keep it this way for now,
>> since it reduces the amount of metadata we have to build when we don't need
>> it.
>>
>> But if sanitizers end up needing more of that information for whatever
>> reason (while not wanting to emit more debug info) this will provide a
>> basis for such a state of affairs in the future.
>>
>
> Sounds good.
>
>
>>
>>
>>> Anyway, to backtracing:
>>>
>>> On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>>
>>>> In an effort to fix inlined information for backtraces under DWARF
>>>> Fission in the absence of the split DWARF (.dwo) files, I'm planning on
>>>> adding -gmlt-like data to the .o file, alongside the skeleton CU.
>>>>
>>>> Since that will involve teaching the LLVM about -gmlt (moreso than it
>>>> already has - the debug info LLVM metadata already describes -gmlt for the
>>>> purposes of omitting pubnames in that case) I figured I'd take the
>>>> opportunity to move the existing -gmlt functionality to the backend to
>>>> begin with, and, in doing so, minimize it a little further since we
>>>> wouldn't need to emit debug info for every function - possibly just those
>>>> that have functions inlined into them.
>>>>
>>>
>>> Right. Currently, if the symbolizer is unable to find a subprogram DIE
>>> corresponding to a PC, it tries to at least fetch the file/line info from
>>> the line table, and assumes that function name might be available in the
>>> symbol table.
>>>
>>>>
>>>> So here's an example of some of my ideas about minimized debug info.
>>>> I'm wondering if I'm right about what's needed for backtracing.
>>>>
>>>> I've removed uninteresting things, like DW_AT_accessibility (which is a
>>>> bug anyway), DW_AT_external (there's no reason symbolication needs that, is
>>>> there?), but also less obviously uninteresting things like DW_AT_frame_base
>>>> (the location of the frame pointer - is that needed for symbolication?)
>>>>
>>>
>>> We don't use DW_AT_accessibility and DW_AT_external.
>>>
>>
>> Great
>>
>>
>>> As Chandler suggests, DW_AT_frame_base might be required for unwinders,
>>> but I don't really know that.
>>>
>>
>>>
>>>>
>>>> Also I've made a frontend (for now) change (see mgmlt_clang.diff) to
>>>> omit the data that causes DW_AT_decl_file/DW_AT_decl_line to be emitted -
>>>> are those needed? I don't think so.
>>>>
>>>
>>> We don't use them.
>>>
>>
>> Excellent
>>
>>
>>>
>>>
>>>>
>>>> But importantly: the only DW_TAG_subprograms are either functions that
>>>> have been inlined, or functions that have been inlined into. Is that enough?
>>>>
>>>> Is it OK that I haven't included debug info for out of line definitions
>>>> of inline functions?
>>>>
>>>> I'm assuming all that information can be retrieved from the symbol
>>>> table.
>>>>
>>>
>>>
>>> See above. Looks like this information is not necessary.
>>>
>>
>> Perfect.
>>
>>
>>>
>>>
>>>>
>>>> (one other thing I noticed is that we don't use the mangled names for
>>>> functions in -gmlt - how on earth does that work?
>>>>
>>>
>>> Yeah, IIRC currently -gmlt doesn't produce DW_AT_linkage_name entries,
>>> only DW_AT_name (DW_AT_linkage_name signifincantly increases the binary
>>> size for heavily templated code). So, instead of Foo::Bar<double>::Baz we
>>> have only "Baz". And we live with that - we fetch just "Baz" from
>>> subprogram entries. If a function is not inlined, then we're able to fetch
>>> its fully-qualified name from the symbol table, if it is inlined and
>>> there's no symbol table entry - fine then, we print just the short name.
>>> Generally this is enough for readable stack traces, as we still have
>>> file/line info (stored in DW_AT_call_file / DW_AT_call_line). The function
>>> names fetched from DW_AT_linkange_name and/or symbol table are demangled
>>> with a call to __cxa_demangle (we assume that it's just available on the
>>> system, and 95% we are right).
>>>
>>
>> OK - if that's the tradeoff you guys have made, I'm happy not to meddle
>> with it.
>>
>> (did you do a comparison with compression enabled for the strings
>> section? At Google I know we don't compress the linked debug info, but we
>> could - this might help in general, and make it not so costly to go from
>> short names to fully mangled names)
>>
>
> In fact, for ASan builds we do use -Wl,--compress-debug-sections=zlib in
> ASan builds... I haven't measured the difference linkage names would cause
> for compressed sections, though. I don't remember any user complaints about
> missing names for inlined functions, but, sure, we might want to add them
> later.
>
>
>>
>>
>>>
>>>
>>>> The backtrace would look really strange if it included the unmangled
>>>> names of functions - or does the symbolizer use the address range of the
>>>> out of line definition (if there is one?) of the inlined function (in which
>>>> case I'd need to provide it... ) to find it in the symbol table, get the
>>>> mangled name, and use that?)
>>>>
>>>> One thing I was thinking of doing as well, is that since the
>>>> DW_AT_abstract_origin just points to a trivial subprogram with a name and
>>>> DW_AT_inline - perhaps instead of an abstract origin, we could just use
>>>> DW_AT_name directly? (with the mangled name, probably) That'd save us
>>>> emitting the extra indirection and the name is uniqued already anyway. (and
>>>> DW_FORM_strp is the same size as DW_FORM_ref4 (though DW_FORM_strp would
>>>> mean extra relocations...) - and perhaps in the near future, DW_FORM_strp
>>>> could be replaced by DW_FORM_str_index to reduce relocations)
>>>>
>>>
>>> Yes, this might work. Generally, when we find a
>>> subprogram/inlined_subroutine DIE we calculate its name by following the
>>> DW_AT_specification/DW_AT_abstract_origin links until we find a DIE with
>>> DW_AT_name provided. If we're able to get the name directly things will
>>> only be better.
>>>
>>
>> So long as you look for the name on the inlined_subroutine first, before
>> walking DW_AT_specification/DW_AT_abstract_origin links, that'll work
>> perfectly if/when we do this.
>>
>> (might have to teach it about DW_FORM_str_index, at some point, though)
>>
>>
>>>
>>>
>>>>
>>>> So... yes/no/maybe?
>>>>
>>>
>>> Speaking of testing, we have some nontrivial amount of sanitizer tests
>>> in compiler-rt that match the expected symbolized stack trace. Currently
>>> the sources are built with "-g", but I think we can detect if the compiler
>>> we test supports -gmlt and/or fission and use the strictest debug info flag
>>> settings we still want to provide nice reports for.
>>>
>>
>> Right, that sounds like a thing to do - I'd rather not make my changes
>> until we've got that in place (& once it's in place I'll try a few obvious
>> "break this and see if the tests fail" sort of things to check that my
>> changes are being properly validated).
>>
>
> OK, I'll let you know once we use -gmlt for sanitizers' test suite.
>

I've switched sanitizers' test suites to -gmlt in r217284.

>
>
>>
>> Can you let me know if you need help/want me to do that work (not that
>> I'm terribly well versed in CMake, but I guess that's true of most of us)
>> and/or when it's done and then I'll see about getting this work committed
>> and moving onto the gmlt-esque+fission stuff.
>>
>> (side note, just to write it down: the gmlt+fission part of this (after
>> this patch that minimizes gmlt by using backend knowledge) will require a
>> fair bit of refactoring, but it'll be good to have the minimized-gmlt work
>> in first and actively tested so I have that as a good baseline that my
>> refactorings are making sense)
>>
>>
>>>
>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>>
>>>
>>>
>>> --
>>> Alexey Samsonov
>>> vonosmas at gmail.com
>>>
>>
>>
>
>
> --
> Alexey Samsonov
> vonosmas at gmail.com
>

-- 
Alexey Samsonov
vonosmas at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140905/fec59a9e/attachment.html>