[LLVMdev] Minimizing -gmlt

Alexey Samsonov vonosmas at gmail.com
Thu Aug 28 11:51:20 PDT 2014


This sounds great. Teaching backend about the -gmlt might help us in
another way: we might enforce full debug info generation in the frontend
for -fsanitize= flags, then rely on some parts of this debug info in
instrumentation passes and prune it before the actual object file
generation. This would be somewhat similar to what -Rpass does, only it
kills all the debug info, while we would need to turn full debug info into
gmlt-like. Anyway, to backtracing:

On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <dblaikie at gmail.com> wrote:

> In an effort to fix inlined information for backtraces under DWARF Fission
> in the absence of the split DWARF (.dwo) files, I'm planning on adding
> -gmlt-like data to the .o file, alongside the skeleton CU.
>
> Since that will involve teaching the LLVM about -gmlt (moreso than it
> already has - the debug info LLVM metadata already describes -gmlt for the
> purposes of omitting pubnames in that case) I figured I'd take the
> opportunity to move the existing -gmlt functionality to the backend to
> begin with, and, in doing so, minimize it a little further since we
> wouldn't need to emit debug info for every function - possibly just those
> that have functions inlined into them.
>

Right. Currently, if the symbolizer is unable to find a subprogram DIE
corresponding to a PC, it tries to at least fetch the file/line info from
the line table, and assumes that function name might be available in the
symbol table.

>
> So here's an example of some of my ideas about minimized debug info. I'm
> wondering if I'm right about what's needed for backtracing.
>
> I've removed uninteresting things, like DW_AT_accessibility (which is a
> bug anyway), DW_AT_external (there's no reason symbolication needs that, is
> there?), but also less obviously uninteresting things like DW_AT_frame_base
> (the location of the frame pointer - is that needed for symbolication?)
>

We don't use DW_AT_accessibility and DW_AT_external. As Chandler suggests,
DW_AT_frame_base might be required for unwinders, but I don't really know
that.


>
> Also I've made a frontend (for now) change (see mgmlt_clang.diff) to omit
> the data that causes DW_AT_decl_file/DW_AT_decl_line to be emitted - are
> those needed? I don't think so.
>

We don't use them.


>
> But importantly: the only DW_TAG_subprograms are either functions that
> have been inlined, or functions that have been inlined into. Is that enough?
>
> Is it OK that I haven't included debug info for out of line definitions of
> inline functions?
>
> I'm assuming all that information can be retrieved from the symbol table.
>


See above. Looks like this information is not necessary.


>
> (one other thing I noticed is that we don't use the mangled names for
> functions in -gmlt - how on earth does that work?
>

Yeah, IIRC currently -gmlt doesn't produce DW_AT_linkage_name entries, only
DW_AT_name (DW_AT_linkage_name signifincantly increases the binary size for
heavily templated code). So, instead of Foo::Bar<double>::Baz we have only
"Baz". And we live with that - we fetch just "Baz" from subprogram entries.
If a function is not inlined, then we're able to fetch its fully-qualified
name from the symbol table, if it is inlined and there's no symbol table
entry - fine then, we print just the short name. Generally this is enough
for readable stack traces, as we still have file/line info (stored in
DW_AT_call_file / DW_AT_call_line). The function names fetched from
DW_AT_linkange_name and/or symbol table are demangled with a call to
__cxa_demangle (we assume that it's just available on the system, and 95%
we are right).


> The backtrace would look really strange if it included the unmangled names
> of functions - or does the symbolizer use the address range of the out of
> line definition (if there is one?) of the inlined function (in which case
> I'd need to provide it... ) to find it in the symbol table, get the mangled
> name, and use that?)
>
> One thing I was thinking of doing as well, is that since the
> DW_AT_abstract_origin just points to a trivial subprogram with a name and
> DW_AT_inline - perhaps instead of an abstract origin, we could just use
> DW_AT_name directly? (with the mangled name, probably) That'd save us
> emitting the extra indirection and the name is uniqued already anyway. (and
> DW_FORM_strp is the same size as DW_FORM_ref4 (though DW_FORM_strp would
> mean extra relocations...) - and perhaps in the near future, DW_FORM_strp
> could be replaced by DW_FORM_str_index to reduce relocations)
>

Yes, this might work. Generally, when we find a
subprogram/inlined_subroutine DIE we calculate its name by following the
DW_AT_specification/DW_AT_abstract_origin links until we find a DIE with
DW_AT_name provided. If we're able to get the name directly things will
only be better.


>
> So... yes/no/maybe?
>

Speaking of testing, we have some nontrivial amount of sanitizer tests in
compiler-rt that match the expected symbolized stack trace. Currently the
sources are built with "-g", but I think we can detect if the compiler we
test supports -gmlt and/or fission and use the strictest debug info flag
settings we still want to provide nice reports for.


>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>


-- 
Alexey Samsonov
vonosmas at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140828/e4a06650/attachment.html>


More information about the llvm-dev mailing list