[PATCH] D123534: [dwarf] Emit a DIGlobalVariable for constant strings.

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 27 09:09:43 PDT 2022


dblaikie added a comment.

In D123534#3475790 <https://reviews.llvm.org/D123534#3475790>, @hctim wrote:

>> summary of DWARF:
>> & how many of these descriptions get added to the debug info?
>
> afaict, there is now:
>  1x .debug_addr entry for each string
>  1x. debug_info DW_TAG_variable for each string
>  1x. DW_TAG_array_type + DW_TAG_subrange_type for each unique sizeof(string)

& you ran an experiment where the type was omitted, but it didn't save much space?

> i tried to measure if there's other bits laying around that could be optimised. i thought briefly about diffing the llvm-dwarfdump for the before/after for clang, but as the dumpfiles reached 20gb, rethought that decision. the dwarfdump for the clang/test/CodeGen/debug-info-variables.c dwo is below.

Yeah, diffing DWARF isn't super practical with all the offsets, etc.

>> Numbers for Split DWARF may be helpful too - given this'll add an extra address/relocation for every string literal, it might make object size (specifically unlinked object size where relocations are expensive/plentiful) significantly larger in problematic ways.
>
> sorry, i don't understand why split-dwarf means this requires an additional relocation (i'm not really sure what split-dwarf is outside of just putting the dwarf in a separate file, but don't see why that would change relocations).

Sorry, to expound more: Split DWARF is intended to reduce the size of object files processed by the linker, and the resulting linked binary. Doing things like adding more pure type information which only goes in the .dwo file has no impact on .o/executable size - but this change adds another relocation for every string literal in the binary, which does grow the debug info that remains in the .o file (the .debug_addr table in particular) and relocations in particular are quite expensive (since they aren't compressed by -gz and take 3x the bytes of the actual address entry)

We should probably do some measurements inside google on these things, since the object size metrics are quite important to linkability in bazel/blaze (go/binary-size-analysis internally has some info - happy to help with measurements, etc).

> i made a quick dwarfdump diff on clang/test/CodeGen/debug-info-variables.c (with split-dwarf):
>
> sections old:
>
>   [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
>   [ 2] .debug_str.dwo    PROGBITS        0000000000000000 000040 0000eb 01 MSE  0   0  1
>   [ 3] .debug_str_offsets.dwo PROGBITS   0000000000000000 00012b 00002c 00   E  0   0  1
>   [ 4] .debug_info.dwo   PROGBITS        0000000000000000 000157 000077 00   E  0   0  1
>   [ 5] .debug_abbrev.dwo PROGBITS        0000000000000000 0001ce 000091 00   E  0   0  1
>
> sections new:
>
>   [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
>   [ 3] .debug_str.dwo    PROGBITS        0000000000000000 000078 0000ff 01 MSE  0   0  1
>   [ 2] .debug_str_offsets.dwo PROGBITS   0000000000000000 000040 000038 00   E  0   0  1
>   [ 4] .debug_info.dwo   PROGBITS        0000000000000000 000177 000092 00   E  0   0  1
>   [ 5] .debug_abbrev.dwo PROGBITS        0000000000000000 000209 0000aa 00   E  0   0  1
>
> so `.debug_string += 0x14`, `.debug_str_offsets += 0xc`, `.debug_info += 0x1b` and `.debug_abbrev += 0x19`.

This data, but for Clang, might be informative - for this small sample it's hard to separate the constant overheads from the linear ones. (eg: the debug_str growth I'd expect to be constant (just the name of the char type and the name of the size type), but the debug_info growth is presumably not constant)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123534/new/

https://reviews.llvm.org/D123534



More information about the llvm-commits mailing list