[PATCH] D20211: [ELF] - Support of compressed debug sections creation(draft).

Thu May 12 14:10:47 PDT 2016

On Thu, May 12, 2016 at 2:02 PM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Thu, May 12, 2016 at 1:02 PM, Rui Ueyama <ruiu at google.com> wrote:
>
>> On Thu, May 12, 2016 at 11:27 AM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, May 12, 2016 at 10:07 AM, Rui Ueyama via llvm-commits <
>>> llvm-commits at lists.llvm.org> wrote:
>>>
>>>> ruiu added a comment.
>>>>
>>>> Please add tests to verify that you are producing compressed sections
>>>> that can be read by other tools.
>>>>
>>>>
>>>> ================
>>>> Comment at: ELF/Options.td:33
>>>> @@ +32,3 @@
>>>> +def compress_debug_sections : Joined<["--"],
>>>> "compress-debug-sections=">,
>>>> +  HelpText<"Force undefined symbol during linking">;
>>>> +
>>>> ----------------
>>>> This help text is out of context.
>>>>
>>>> ================
>>>> Comment at: ELF/OutputSections.cpp:833-837
>>>> @@ +832,7 @@
>>>> +  // compression.
>>>> +  if (Config->CompressDebugSections == "zlib-gabi" ||
>>>> +    Config->CompressDebugSections == "zlib") {
>>>> +    Header.sh_flags |= SHF_COMPRESSED;
>>>> +  }
>>>> +  else {
>>>> +    // Compress all candidate sections using ZLIB compression, using
>>>> the GNU
>>>> ----------------
>>>> Clang-format.
>>>>
>>>> ================
>>>> Comment at: ELF/OutputSections.cpp:842
>>>> @@ +841,3 @@
>>>> +    // renamed to start with.zdebug to identify the use of compression.
>>>> +    StringSaver Saver(ScriptConfig->Alloc);
>>>> +    this->Name = Saver.save(Twine(".z") + Name.drop_front(1));
>>>> ----------------
>>>> Please don't use an unrelated object's members.
>>>>
>>>> ================
>>>> Comment at: ELF/OutputSections.cpp:859-862
>>>> @@ +858,6 @@
>>>> +template <class ELFT> void OutputSection<ELFT>::finalizeCompressed() {
>>>> +  std::vector<uint8_t> Uncompressed(getSize());
>>>> +  uint8_t *Data = Uncompressed.data();
>>>> +  for (InputSection<ELFT> *C : Sections)
>>>> +    C->writeTo(Data);
>>>> +
>>>> ----------------
>>>> This code creates an uncompressed a debug section in memory and then
>>>> compress it at once. It consumes as much memory as the total size of the
>>>> debug section. It is probably unacceptable.
>>>>
>>>> zlib supports "streaming" compression, right? I think you can pass
>>>> pieces of data to zlib object and let it construct compressed data
>>>> incrementally. Please do that way so that the maximum memory is bounded to
>>>> the largest input section size instead of the largest output section size.
>>>>
>>>
>>> Would the input be decompressed in a streaming manner as well? So even a
>>> large input section wouldn't be entirely in-memory at once?
>>>
>>> (presumably you need to compute the size of the section before writing
>>> it? So you can write other sections in parallel, etc - in which case you
>>> might do a null compression (compressing into /dev/null, of sorts) to
>>> compute the size of the compressed data, then repeating the work to
>>> actually write it out when needed - not sure whether that'll be faster
>>> overall... )
>>>
>>
>> Yes, if we want to write to the file while compressing sections, we need
>> to know each section size in compressed format beforehand.
>>
>
> Ah, I understand your comments earlier better then - you just want to
> stream in from the various inputs but still expect to end up with the whole
> compressed contents in memory (so the size can be determined before writing
> it out to the file).
>

Correct. I just want to avoid creating a whole uncompressed contents and
compress it, if possible.

>
>
>> Compressing sections twice is probably too costly.
>>
>
> I wonder if it's possible to more cheaply compute the compressed size than
> the compressed contents - probably not much I would guess (without much
> basis for that guess).
>
>
>> We at least need to have an input section in memory.
>>
>
> What do you mean by 'in memory' - memory mapping is potentially different
> from actually allocating memory for it, right? (one might just look like
> memory, the other is going to be actual memory - at least that appears to
> be how our internal memory accounting works - the input files are put on
> tmpfs, memory mapped in but that doesn't "cost" any of the process's memory
> because it's allocated before/not by the process)
>

By in memory, I mean the input section and output section contents need to
be represented as a byte array whether they are mmap'ed or not.

>
>
>> We are using mmap'ed IO, and in the first step we copy input sections to
>> output sections and then apply relocations to the copied data. That process
>> treats section contents as a byte array.
>>
>> FWIW, I think Google (using gold) mostly just compresses the /input/, but
>>> doesn't compress the output on the link. So that's a scenario we care about
>>> - I don't know if that particular combination affects the design in any
>>> particular way, though.
>>>
>>
>> Interesting. I think we don't want to support a feature if it has no
>> user. Maybe we should support decompressing debug sections, at least as the
>> first step?
>>
>
> Seems like a good first step to me, but I've no idea what users the author
> of this patch is trying to support, or what kind of users are out there,
> etc.
>
>
>>
>>
>>> ================
>>>> Comment at: ELF/OutputSections.cpp:869-874
>>>> @@ +868,8 @@
>>>> +
>>>> +  // If not successfully compressed or total size is greater than
>>>> +  // size before applying compression, then cancel compressed output.
>>>> +  size_t TotalSize = Compressed.size() +
>>>> getCompressionHeaderSize<ELFT>();
>>>> +  Compress = Success == zlib::Status::StatusOK && TotalSize <
>>>> getSize();
>>>> +  if (!Compress)
>>>> +    return;
>>>> +
>>>> ----------------
>>>> I think we don't have to care about this edge case. Unless you are
>>>> compressing random bits, it usually shrink.
>>>>
>>>
>>> Because the compression is per section & some parts of DWARF are pretty
>>> compact, not everything is improved by compression - but it takes pretty
>>> small samples of debug info to not compress, it's true. I did implement
>>> behavior like this (skip compression when not beneficial) in LLVM's
>>> integrated assembler support for compressed debug info.
>>>
>>>
>>>>
>>>> ================
>>>> Comment at: ELF/OutputSections.cpp:1014-1025
>>>> @@ +1013,14 @@
>>>> +  const endianness E = ELFT::TargetEndianness;
>>>> +  if (Config->CompressDebugSections == "zlib-gnu") {
>>>> +    // 4 bytes magic + size of uncompressed data in big endian.
>>>> +    memcpy(Buf, "ZLIB", 4);
>>>> +    write64be(Buf + 4, UncompressedSize);
>>>> +  } else {
>>>> +    // "zlib-gabi" or "zlib".
>>>> +    write32<E>(Buf, ELFCOMPRESS_ZLIB);
>>>> +    if (ELFT::Is64Bits)
>>>> +      write64<E>(Buf + 2 * sizeof(Elf64_Word), UncompressedSize);
>>>> +    else
>>>> +      write32<E>(Buf + sizeof(Elf32_Word), UncompressedSize);
>>>> +  }
>>>> +
>>>> ----------------
>>>> Why we have two types of headers? If one is obsolete, we probably don't
>>>> want to implement it.
>>>>
>>>> ================
>>>> Comment at: ELF/Writer.cpp:1161-1163
>>>> @@ -1158,2 +1160,5 @@
>>>>
>>>>  template <class ELFT>
>>>> +static bool isCompressionCandidate(InputSectionBase<ELFT> *C,
>>>> +                                   StringRef OutSecName) {
>>>> +  if (Config->CompressDebugSections == "none")
>>>> ----------------
>>>> You can move this to OutputSection class, no?
>>>>
>>>> ================
>>>> Comment at: ELF/Writer.cpp:1172
>>>> @@ +1171,3 @@
>>>> +    return false;
>>>> +  return zlib::isAvailable();
>>>> +}
>>>> ----------------
>>>> This condition should be at the beginning of this function.
>>>>
>>>>
>>>> http://reviews.llvm.org/D20211
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160512/f90aeadd/attachment.html>