[llvm-dev] [DWARF] using simplified template names

Fri Oct 8 17:48:44 PDT 2021

I think I'm down to one of the last pieces for rebuilding the names & being
able to round-trip them through llvm-dwarfdump --verify:
https://reviews.llvm.org/D111477 - in case anyone's got opinions on what we
should do with integer type suffixes on non-type template parameters.

A few other remaining pieces:
1) make the "operator" detection a bit better:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/DebugInfo/DWARF/DWARFDie.cpp#L308-L311
- Don't think we can rely on there being a space after the word "operator"
(because it might be "operator<" for instance) so maybe I just need a full
regex/exhaustive list of valid identifier characters, so if it's "operator"
followed by an identifier character, then it doesn't trigger this special
case? Or the inverse - special case all the whitespace+first characters in
operator overloads. That's probably a smaller set.
2) Integrate this into llvm-symbolizer so it rebuilds the names
automatically

Then there's some lldb bugs to fix, etc.

On Wed, Jun 23, 2021 at 3:06 PM David Blaikie <dblaikie at gmail.com> wrote:

> On Wed, Jun 23, 2021 at 1:14 PM <paul.robinson at sony.com> wrote:
> >
> > >> Oh, is there any consequence for deduplication in LTO?  Isn’t that
> name-based?
> >
> > > Should be OK - that's based on the fully mangled/linkage name of the
> type, which would be untouched by this.
> >
> > I’ve recently been reminded that type-unit signatures are hashes of the
> name, not using the standard-recommended algorithm of hashing the content;
> I tried to work out which name is actually used, but it’s buried deeper
> than I am comfortable excavating.  Can we make sure that hash is using
> either the name-with-parameters, or the linkage name, as the input string?
> We don’t want “foo<int>” and “foo<float>” using the same type-unit
> signature!
>
> Worth checking, but yeah, not a problem - we don't emit class linkage
> names, so the only reason we carry the linkage name on types is for
> ODR deduplicating during LTO linking, and also using it for type units
> when those are enabled - the linkage name is stored in the
> DICompositeType's "identifier" field - not something readily confused
> with being guaranteed to be the linkage name nor used for
> DW_AT_linkage_name, etc. Only used as a unique identifier. That won't
> be touched.
>
>
> As an aside: I do have another direction I'm interested in pursuing
> that's related to linkage names, rather than the pretty names: We
> could reduce the number of DW_AT_linkage_names we emit by
> reconstituting linkage names in symbolizers instead (eg: if we see a
> function called "f3" with a single "int" formal parameter and void
> return type - we can reconstruct the linkage name for that function as
> _Zf3iv or whatever it is).
>
> On one particularly pathological case I'm looking at, the simplified
> pretty template names is worth 43% reduction in the final dwp
> .debug_str.dwo and a rough estimate on the linkage name (omitting
> linkage names from most cases when Clang's building the IR - there are
> certain kinds of template cases that are hard to reconstruct, but
> others that are easy/do-able with our current DWARF) 52%, and combined
> for 95% reduction in debug string size. (a less pathalogical case, one
> of Google's largest binaries, it was 26%/56% for 82% total reduction)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211008/87be9e5c/attachment.html>