[PATCH] D117853: [ELF] Parallelize --compress-debug-sections=zlib

Mon Jan 24 17:44:27 PST 2022

dblaikie added a comment.

In D117853#3268012 <https://reviews.llvm.org/D117853#3268012>, @MaskRay wrote:

> In D117853#3267965 <https://reviews.llvm.org/D117853#3267965>, @dblaikie wrote:
>
>> In D117853#3261870 <https://reviews.llvm.org/D117853#3261870>, @MaskRay wrote:
>>
>>> In D117853#3261856 <https://reviews.llvm.org/D117853#3261856>, @dblaikie wrote:
>>>
>>>> Is there any chance to avoid buffering the compressed output? (I guess probably not, because you need to know how large it is before you write it to the output file (if you want to parallelize writing sections, which is important no doubt))
>>>
>>> I have asked myself this question... Unfortunately no. To have accurate estimate of sizes, we have to buffer all compressed output.
>>> It's needed to compute sh_offset and sh_size fields of a .debug_* section. To know the size we need to compress it first (or estimate, but the compression ratio is not easy to estimate).
>>>
>>> I think pigz uses an approach to only keep `concurrency` shards, but it does not have the requirement to know the output size beforehand.
>>
>> Yeah, I guess out of scope for this change - but maybe another time. It'd break parallelism, but you could stream out a section at a time (at least for the compressed sections) and then seek back to write the sh* offset fields based on how the compression actually worked out.
>>
>> I guess for Split DWARF the memory savings wouldn't be that significant, though? Do you have a sense of how much memory it'd take.
>
> The saving is still large because of .debug_line.

I mostly meant the memory savings that might be available if we could avoid caching compressed debug info output sections - I guess looking at the numbers you posted, assuming lld's internal data structures don't use much memory compared to the output size & assuming you're writing to tmpfs so the output counts as memory usage - that's still like half the output file size again as memory usage for compressed output section buffers, so a possible 30% reduction in memory usage or so... which seems pretty valuable, but hard to achieve for sure.

>> Another direction to go could be to do compressed data concatenation - if the compression algorithm supports concatenation, you could lose some size benefits and gain speed (like lld's sliding scale of string deduplication) by just concatenating the compressed sections together - predictable size and you could write the updated compressed section header based on the input sections headers.
>
> The concatenation approach is what used here :)

Ah, sorry, I meant concatenation of the input sections - no need to decompress or recompress, but that only applies if there are no relocations or other changes to apply to the data.

>> Though I guess most of the DWARF sections remaining in the objects/linked binary when using Split DWARF require relocations to be applied, so that requires decompressing/recompressing anyway... :/
>
> The end of https://maskray.me/blog/2022-01-23-compressed-debug-sections#linkers discusses why not allocating a buffer is tricky and is not generic enough.
> Updating section headers afterwards has an issue that the output file size is unknown so cannot mmap the output in a read-write way.

Ah - I think gold's dwp does it by using a pwrite stream instead - streaming out the section contents and then seeking back to modify the header, rather than memory mapped copies. Not sure what the performance tradeoffs are like for that & whether you could then go back after streaming out the compressed data - and then I guess maybe reopening as memory mapped to write out the rest of the contents.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117853/new/

https://reviews.llvm.org/D117853