[LLVMdev] Minimizing -gmlt

Fri Sep 5 15:45:23 PDT 2014

Awesome - thanks a bunch!

On Fri, Sep 5, 2014 at 3:16 PM, Alexey Samsonov <vonosmas at gmail.com> wrote:

>
> On Fri, Aug 29, 2014 at 10:49 AM, Alexey Samsonov <vonosmas at gmail.com>
> wrote:
>
>>
>>
>>
>> On Thu, Aug 28, 2014 at 1:51 PM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>>
>>>
>>>
>>>
>>> On Thu, Aug 28, 2014 at 11:51 AM, Alexey Samsonov <vonosmas at gmail.com>
>>> wrote:
>>>
>>>> This sounds great. Teaching backend about the -gmlt might help us in
>>>> another way: we might enforce full debug info generation in the frontend
>>>> for -fsanitize= flags, then rely on some parts of this debug info in
>>>> instrumentation passes and prune it before the actual object file
>>>> generation. This would be somewhat similar to what -Rpass does, only it
>>>> kills all the debug info, while we would need to turn full debug info into
>>>> gmlt-like.
>>>>
>>>
>>> Yep, this crossed my mind (removing most of the extra codepaths from
>>> Clang would be nice) but I figured we'd probably keep it this way for now,
>>> since it reduces the amount of metadata we have to build when we don't need
>>> it.
>>>
>>> But if sanitizers end up needing more of that information for whatever
>>> reason (while not wanting to emit more debug info) this will provide a
>>> basis for such a state of affairs in the future.
>>>
>>
>> Sounds good.
>>
>>
>>>
>>>
>>>> Anyway, to backtracing:
>>>>
>>>> On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <dblaikie at gmail.com>
>>>> wrote:
>>>>
>>>>> In an effort to fix inlined information for backtraces under DWARF
>>>>> Fission in the absence of the split DWARF (.dwo) files, I'm planning on
>>>>> adding -gmlt-like data to the .o file, alongside the skeleton CU.
>>>>>
>>>>> Since that will involve teaching the LLVM about -gmlt (moreso than it
>>>>> already has - the debug info LLVM metadata already describes -gmlt for the
>>>>> purposes of omitting pubnames in that case) I figured I'd take the
>>>>> opportunity to move the existing -gmlt functionality to the backend to
>>>>> begin with, and, in doing so, minimize it a little further since we
>>>>> wouldn't need to emit debug info for every function - possibly just those
>>>>> that have functions inlined into them.
>>>>>
>>>>
>>>> Right. Currently, if the symbolizer is unable to find a subprogram DIE
>>>> corresponding to a PC, it tries to at least fetch the file/line info from
>>>> the line table, and assumes that function name might be available in the
>>>> symbol table.
>>>>
>>>>>
>>>>> So here's an example of some of my ideas about minimized debug info.
>>>>> I'm wondering if I'm right about what's needed for backtracing.
>>>>>
>>>>> I've removed uninteresting things, like DW_AT_accessibility (which is
>>>>> a bug anyway), DW_AT_external (there's no reason symbolication needs that,
>>>>> is there?), but also less obviously uninteresting things like
>>>>> DW_AT_frame_base (the location of the frame pointer - is that needed for
>>>>> symbolication?)
>>>>>
>>>>
>>>> We don't use DW_AT_accessibility and DW_AT_external.
>>>>
>>>
>>> Great
>>>
>>>
>>>> As Chandler suggests, DW_AT_frame_base might be required for unwinders,
>>>> but I don't really know that.
>>>>
>>>
>>>>
>>>>>
>>>>> Also I've made a frontend (for now) change (see mgmlt_clang.diff) to
>>>>> omit the data that causes DW_AT_decl_file/DW_AT_decl_line to be emitted -
>>>>> are those needed? I don't think so.
>>>>>
>>>>
>>>> We don't use them.
>>>>
>>>
>>> Excellent
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> But importantly: the only DW_TAG_subprograms are either functions that
>>>>> have been inlined, or functions that have been inlined into. Is that enough?
>>>>>
>>>>> Is it OK that I haven't included debug info for out of line
>>>>> definitions of inline functions?
>>>>>
>>>>> I'm assuming all that information can be retrieved from the symbol
>>>>> table.
>>>>>
>>>>
>>>>
>>>> See above. Looks like this information is not necessary.
>>>>
>>>
>>> Perfect.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> (one other thing I noticed is that we don't use the mangled names for
>>>>> functions in -gmlt - how on earth does that work?
>>>>>
>>>>
>>>> Yeah, IIRC currently -gmlt doesn't produce DW_AT_linkage_name entries,
>>>> only DW_AT_name (DW_AT_linkage_name signifincantly increases the binary
>>>> size for heavily templated code). So, instead of Foo::Bar<double>::Baz we
>>>> have only "Baz". And we live with that - we fetch just "Baz" from
>>>> subprogram entries. If a function is not inlined, then we're able to fetch
>>>> its fully-qualified name from the symbol table, if it is inlined and
>>>> there's no symbol table entry - fine then, we print just the short name.
>>>> Generally this is enough for readable stack traces, as we still have
>>>> file/line info (stored in DW_AT_call_file / DW_AT_call_line). The function
>>>> names fetched from DW_AT_linkange_name and/or symbol table are demangled
>>>> with a call to __cxa_demangle (we assume that it's just available on the
>>>> system, and 95% we are right).
>>>>
>>>
>>> OK - if that's the tradeoff you guys have made, I'm happy not to meddle
>>> with it.
>>>
>>> (did you do a comparison with compression enabled for the strings
>>> section? At Google I know we don't compress the linked debug info, but we
>>> could - this might help in general, and make it not so costly to go from
>>> short names to fully mangled names)
>>>
>>
>> In fact, for ASan builds we do use -Wl,--compress-debug-sections=zlib in
>> ASan builds... I haven't measured the difference linkage names would cause
>> for compressed sections, though. I don't remember any user complaints about
>> missing names for inlined functions, but, sure, we might want to add them
>> later.
>>
>>
>>>
>>>
>>>>
>>>>
>>>>> The backtrace would look really strange if it included the unmangled
>>>>> names of functions - or does the symbolizer use the address range of the
>>>>> out of line definition (if there is one?) of the inlined function (in which
>>>>> case I'd need to provide it... ) to find it in the symbol table, get the
>>>>> mangled name, and use that?)
>>>>>
>>>>> One thing I was thinking of doing as well, is that since the
>>>>> DW_AT_abstract_origin just points to a trivial subprogram with a name and
>>>>> DW_AT_inline - perhaps instead of an abstract origin, we could just use
>>>>> DW_AT_name directly? (with the mangled name, probably) That'd save us
>>>>> emitting the extra indirection and the name is uniqued already anyway. (and
>>>>> DW_FORM_strp is the same size as DW_FORM_ref4 (though DW_FORM_strp would
>>>>> mean extra relocations...) - and perhaps in the near future, DW_FORM_strp
>>>>> could be replaced by DW_FORM_str_index to reduce relocations)
>>>>>
>>>>
>>>> Yes, this might work. Generally, when we find a
>>>> subprogram/inlined_subroutine DIE we calculate its name by following the
>>>> DW_AT_specification/DW_AT_abstract_origin links until we find a DIE with
>>>> DW_AT_name provided. If we're able to get the name directly things will
>>>> only be better.
>>>>
>>>
>>> So long as you look for the name on the inlined_subroutine first, before
>>> walking DW_AT_specification/DW_AT_abstract_origin links, that'll work
>>> perfectly if/when we do this.
>>>
>>> (might have to teach it about DW_FORM_str_index, at some point, though)
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> So... yes/no/maybe?
>>>>>
>>>>
>>>> Speaking of testing, we have some nontrivial amount of sanitizer tests
>>>> in compiler-rt that match the expected symbolized stack trace. Currently
>>>> the sources are built with "-g", but I think we can detect if the compiler
>>>> we test supports -gmlt and/or fission and use the strictest debug info flag
>>>> settings we still want to provide nice reports for.
>>>>
>>>
>>> Right, that sounds like a thing to do - I'd rather not make my changes
>>> until we've got that in place (& once it's in place I'll try a few obvious
>>> "break this and see if the tests fail" sort of things to check that my
>>> changes are being properly validated).
>>>
>>
>> OK, I'll let you know once we use -gmlt for sanitizers' test suite.
>>
>
> I've switched sanitizers' test suites to -gmlt in r217284.
>
>
>>
>>
>>>
>>> Can you let me know if you need help/want me to do that work (not that
>>> I'm terribly well versed in CMake, but I guess that's true of most of us)
>>> and/or when it's done and then I'll see about getting this work committed
>>> and moving onto the gmlt-esque+fission stuff.
>>>
>>> (side note, just to write it down: the gmlt+fission part of this (after
>>> this patch that minimizes gmlt by using backend knowledge) will require a
>>> fair bit of refactoring, but it'll be good to have the minimized-gmlt work
>>> in first and actively tested so I have that as a good baseline that my
>>> refactorings are making sense)
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Alexey Samsonov
>>>> vonosmas at gmail.com
>>>>
>>>
>>>
>>
>>
>> --
>> Alexey Samsonov
>> vonosmas at gmail.com
>>
>
>
>
> --
> Alexey Samsonov
> vonosmas at gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140905/16de8530/attachment.html>