<div dir="ltr"><div class="gmail_extra">This sounds great. Teaching backend about the -gmlt might help us in another way: we might enforce full debug info generation in the frontend for -fsanitize= flags, then rely on some parts of this debug info in instrumentation passes and prune it before the actual object file generation. This would be somewhat similar to what -Rpass does, only it kills all the debug info, while we would need to turn full debug info into gmlt-like. Anyway, to backtracing:<br>
<br><div class="gmail_quote">On Wed, Aug 27, 2014 at 4:40 PM, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">In an effort to fix inlined information for backtraces under DWARF Fission in the absence of the split DWARF (.dwo) files, I'm planning on adding -gmlt-like data to the .o file, alongside the skeleton CU.<br>
<br>Since that will involve teaching the LLVM about -gmlt (moreso than it already has - the debug info LLVM metadata already describes -gmlt for the purposes of omitting pubnames in that case) I figured I'd take the opportunity to move the existing -gmlt functionality to the backend to begin with, and, in doing so, minimize it a little further since we wouldn't need to emit debug info for every function - possibly just those that have functions inlined into them.<br>
</div></blockquote><div><br></div><div>Right. Currently, if the symbolizer is unable to find a subprogram DIE corresponding to a PC, it tries to at least fetch the file/line info from the line table, and assumes that function name might be available in the symbol table.</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">
<br>So here's an example of some of my ideas about minimized debug info. I'm wondering if I'm right about what's needed for backtracing.<br><br>I've removed uninteresting things, like DW_AT_accessibility (which is a bug anyway), DW_AT_external (there's no reason symbolication needs that, is there?), but also less obviously uninteresting things like DW_AT_frame_base (the location of the frame pointer - is that needed for symbolication?)<br>
</div></blockquote><div><br></div><div>We don't use DW_AT_accessibility and DW_AT_external. As Chandler suggests, DW_AT_frame_base might be required for unwinders, but I don't really know that.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">
<br>Also I've made a frontend (for now) change (see mgmlt_clang.diff) to omit the data that causes DW_AT_decl_file/DW_AT_decl_line to be emitted - are those needed? I don't think so.<br></div></blockquote><div><br>
</div><div>We don't use them.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br>But importantly: the only DW_TAG_subprograms are either functions that have been inlined, or functions that have been inlined into. Is that enough?<br>
<br>Is it OK that I haven't included debug info for out of line definitions of inline functions?<br><br>I'm assuming all that information can be retrieved from the symbol table.<br></div></blockquote><div><br></div>
<div><br></div><div>See above. Looks like this information is not necessary.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br>(one other thing I noticed is that we don't use the mangled names for functions in -gmlt - how on earth does that work?</div>
</blockquote><div><br></div><div>Yeah, IIRC currently -gmlt doesn't produce DW_AT_linkage_name entries, only DW_AT_name (DW_AT_linkage_name signifincantly increases the binary size for heavily templated code). So, instead of Foo::Bar<double>::Baz we have only "Baz". And we live with that - we fetch just "Baz" from subprogram entries. If a function is not inlined, then we're able to fetch its fully-qualified name from the symbol table, if it is inlined and there's no symbol table entry - fine then, we print just the short name. Generally this is enough for readable stack traces, as we still have file/line info (stored in DW_AT_call_file / DW_AT_call_line). The function names fetched from DW_AT_linkange_name and/or symbol table are demangled with a call to __cxa_demangle (we assume that it's just available on the system, and 95% we are right).</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"> The backtrace would look really strange if it included the unmangled names of functions - or does the symbolizer use the address range of the out of line definition (if there is one?) of the inlined function (in which case I'd need to provide it... ) to find it in the symbol table, get the mangled name, and use that?)<br>
<br>One thing I was thinking of doing as well, is that since the DW_AT_abstract_origin just points to a trivial subprogram with a name and DW_AT_inline - perhaps instead of an abstract origin, we could just use DW_AT_name directly? (with the mangled name, probably) That'd save us emitting the extra indirection and the name is uniqued already anyway. (and DW_FORM_strp is the same size as DW_FORM_ref4 (though DW_FORM_strp would mean extra relocations...) - and perhaps in the near future, DW_FORM_strp could be replaced by DW_FORM_str_index to reduce relocations)<br>
</div></blockquote><div><br></div><div>Yes, this might work. Generally, when we find a subprogram/inlined_subroutine DIE we calculate its name by following the DW_AT_specification/DW_AT_abstract_origin links until we find a DIE with DW_AT_name provided. If we're able to get the name directly things will only be better.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">
<br>So... yes/no/maybe?</div></blockquote><div><br></div><div>Speaking of testing, we have some nontrivial amount of sanitizer tests in compiler-rt that match the expected symbolized stack trace. Currently the sources are built with "-g", but I think we can detect if the compiler we test supports -gmlt and/or fission and use the strictest debug info flag settings we still want to provide nice reports for.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Alexey Samsonov<br><a href="mailto:vonosmas@gmail.com" target="_blank">vonosmas@gmail.com</a></div>
</div></div>