[llvm-dev] DWARF: Reconstituting mangled names (& skipping DW_AT_linkage_name)

Thu Jul 1 21:23:22 PDT 2021

One possibility is to make the reference to the linkage name an indirection
into strtab proper rather than .debug_strtab. There are issues with
stripping and such when that is done, but then you only have one copy
between the two uses.

On Thu, Jul 1, 2021 at 8:22 PM Reid Kleckner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> It could work, but the long linkage names will still be present in
> .strtab, so I wonder if it would make more sense to pursue a solution that
> addresses both issues. I happen to know you were considering a separate
> proposal for that, and I wonder if it could be used to solve this problem
> as well. Either way, the debug info consumer must be taught to look up or
> reconstitute the long mangled name.
>
> I was thinking something like, "if symbol name is longer than X threshold,
> replace it with _H${contenthash}, place the long name in a side table
> section". Tools that are aware of the new convention can do the lookup in
> the side table. Tools that are unaware will just produce funny names. The
> DWARF linkage name would use the _H symbol, and consumers that care beyond
> just having a unique linkage identifier can do the lookup.
>
> There is prior art for this. MSVC caps linkage names at 4096, I believe,
> and hashes the name down with MD5:
>
> https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/MicrosoftMangle.cpp#L53
>
> On Thu, Jun 24, 2021 at 5:32 PM David Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> In addition to simplifying template names (
>> https://groups.google.com/g/llvm-dev/c/ekLMllbLIZg ) another case I've
>> found in my use case is a lot of mangled names (in part because we build
>> with -fdebug-info-for-profiling which turns on function linkage names even
>> at -g1/-gmlt).
>>
>> So I was wondering if we could recreate linkage names from DWARF, rather
>> than encoding them directly - and I have a prototype that seems to show
>> this is possible (at least some simple cases - including some template
>> cases).
>>
>> In the pathological case I'm looking at (lots of expression templates in
>> TensorFlow) skipping linkage names in the cases I think we can reconstitute
>> (but I haven't implemented the full logic and verified everything can be
>> reconstituted) reduced .debug_str.dwo by 52% (and that composes/stacks with
>> the 43% reduction from the simplified template names - for a 95% reduction
>> in total) and in a large but less pathological binary it was 56% (in
>> addition to 25% from the template names, still 80% reduction overall).
>>
>> Wondering if anyone's interested in this? Has
>> thoughts/feelings/concerns/etc?
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210701/4b772536/attachment.html>