[PATCH] D20211: [ELF] - Support of compressed debug sections creation(draft).

Thu May 12 14:26:05 PDT 2016

On Thu, May 12, 2016 at 2:22 PM, Rui Ueyama <ruiu at google.com> wrote:

> By the way, do you happen to know why there are two header types for
> compressed sections?
>

Can't say I've heard/know anything about that - do you have more details? I
think I just started by implementing the dumping support based on a GCC
produced object, then implemented the LLVM support to match.

- Dave

>
> On Thu, May 12, 2016 at 2:10 PM, Rui Ueyama <ruiu at google.com> wrote:
>
>> On Thu, May 12, 2016 at 2:02 PM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, May 12, 2016 at 1:02 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>
>>>> On Thu, May 12, 2016 at 11:27 AM, David Blaikie <dblaikie at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, May 12, 2016 at 10:07 AM, Rui Ueyama via llvm-commits <
>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>
>>>>>> ruiu added a comment.
>>>>>>
>>>>>> Please add tests to verify that you are producing compressed sections
>>>>>> that can be read by other tools.
>>>>>>
>>>>>>
>>>>>> ================
>>>>>> Comment at: ELF/Options.td:33
>>>>>> @@ +32,3 @@
>>>>>> +def compress_debug_sections : Joined<["--"],
>>>>>> "compress-debug-sections=">,
>>>>>> +  HelpText<"Force undefined symbol during linking">;
>>>>>> +
>>>>>> ----------------
>>>>>> This help text is out of context.
>>>>>>
>>>>>> ================
>>>>>> Comment at: ELF/OutputSections.cpp:833-837
>>>>>> @@ +832,7 @@
>>>>>> +  // compression.
>>>>>> +  if (Config->CompressDebugSections == "zlib-gabi" ||
>>>>>> +    Config->CompressDebugSections == "zlib") {
>>>>>> +    Header.sh_flags |= SHF_COMPRESSED;
>>>>>> +  }
>>>>>> +  else {
>>>>>> +    // Compress all candidate sections using ZLIB compression, using
>>>>>> the GNU
>>>>>> ----------------
>>>>>> Clang-format.
>>>>>>
>>>>>> ================
>>>>>> Comment at: ELF/OutputSections.cpp:842
>>>>>> @@ +841,3 @@
>>>>>> +    // renamed to start with.zdebug to identify the use of
>>>>>> compression.
>>>>>> +    StringSaver Saver(ScriptConfig->Alloc);
>>>>>> +    this->Name = Saver.save(Twine(".z") + Name.drop_front(1));
>>>>>> ----------------
>>>>>> Please don't use an unrelated object's members.
>>>>>>
>>>>>> ================
>>>>>> Comment at: ELF/OutputSections.cpp:859-862
>>>>>> @@ +858,6 @@
>>>>>> +template <class ELFT> void OutputSection<ELFT>::finalizeCompressed()
>>>>>> {
>>>>>> +  std::vector<uint8_t> Uncompressed(getSize());
>>>>>> +  uint8_t *Data = Uncompressed.data();
>>>>>> +  for (InputSection<ELFT> *C : Sections)
>>>>>> +    C->writeTo(Data);
>>>>>> +
>>>>>> ----------------
>>>>>> This code creates an uncompressed a debug section in memory and then
>>>>>> compress it at once. It consumes as much memory as the total size of the
>>>>>> debug section. It is probably unacceptable.
>>>>>>
>>>>>> zlib supports "streaming" compression, right? I think you can pass
>>>>>> pieces of data to zlib object and let it construct compressed data
>>>>>> incrementally. Please do that way so that the maximum memory is bounded to
>>>>>> the largest input section size instead of the largest output section size.
>>>>>>
>>>>>
>>>>> Would the input be decompressed in a streaming manner as well? So even
>>>>> a large input section wouldn't be entirely in-memory at once?
>>>>>
>>>>> (presumably you need to compute the size of the section before writing
>>>>> it? So you can write other sections in parallel, etc - in which case you
>>>>> might do a null compression (compressing into /dev/null, of sorts) to
>>>>> compute the size of the compressed data, then repeating the work to
>>>>> actually write it out when needed - not sure whether that'll be faster
>>>>> overall... )
>>>>>
>>>>
>>>> Yes, if we want to write to the file while compressing sections, we
>>>> need to know each section size in compressed format beforehand.
>>>>
>>>
>>> Ah, I understand your comments earlier better then - you just want to
>>> stream in from the various inputs but still expect to end up with the whole
>>> compressed contents in memory (so the size can be determined before writing
>>> it out to the file).
>>>
>>
>> Correct. I just want to avoid creating a whole uncompressed contents and
>> compress it, if possible.
>>
>>
>>>
>>>
>>>> Compressing sections twice is probably too costly.
>>>>
>>>
>>> I wonder if it's possible to more cheaply compute the compressed size
>>> than the compressed contents - probably not much I would guess (without
>>> much basis for that guess).
>>>
>>>
>>>> We at least need to have an input section in memory.
>>>>
>>>
>>> What do you mean by 'in memory' - memory mapping is potentially
>>> different from actually allocating memory for it, right? (one might just
>>> look like memory, the other is going to be actual memory - at least that
>>> appears to be how our internal memory accounting works - the input files
>>> are put on tmpfs, memory mapped in but that doesn't "cost" any of the
>>> process's memory because it's allocated before/not by the process)
>>>
>>
>> By in memory, I mean the input section and output section contents need
>> to be represented as a byte array whether they are mmap'ed or not.
>>
>>
>>>
>>>
>>>> We are using mmap'ed IO, and in the first step we copy input sections
>>>> to output sections and then apply relocations to the copied data. That
>>>> process treats section contents as a byte array.
>>>>
>>>> FWIW, I think Google (using gold) mostly just compresses the /input/,
>>>>> but doesn't compress the output on the link. So that's a scenario we care
>>>>> about - I don't know if that particular combination affects the design in
>>>>> any particular way, though.
>>>>>
>>>>
>>>> Interesting. I think we don't want to support a feature if it has no
>>>> user. Maybe we should support decompressing debug sections, at least as the
>>>> first step?
>>>>
>>>
>>> Seems like a good first step to me, but I've no idea what users the
>>> author of this patch is trying to support, or what kind of users are out
>>> there, etc.
>>>
>>>
>>>>
>>>>
>>>>> ================
>>>>>> Comment at: ELF/OutputSections.cpp:869-874
>>>>>> @@ +868,8 @@
>>>>>> +
>>>>>> +  // If not successfully compressed or total size is greater than
>>>>>> +  // size before applying compression, then cancel compressed output.
>>>>>> +  size_t TotalSize = Compressed.size() +
>>>>>> getCompressionHeaderSize<ELFT>();
>>>>>> +  Compress = Success == zlib::Status::StatusOK && TotalSize <
>>>>>> getSize();
>>>>>> +  if (!Compress)
>>>>>> +    return;
>>>>>> +
>>>>>> ----------------
>>>>>> I think we don't have to care about this edge case. Unless you are
>>>>>> compressing random bits, it usually shrink.
>>>>>>
>>>>>
>>>>> Because the compression is per section & some parts of DWARF are
>>>>> pretty compact, not everything is improved by compression - but it takes
>>>>> pretty small samples of debug info to not compress, it's true. I did
>>>>> implement behavior like this (skip compression when not beneficial) in
>>>>> LLVM's integrated assembler support for compressed debug info.
>>>>>
>>>>>
>>>>>>
>>>>>> ================
>>>>>> Comment at: ELF/OutputSections.cpp:1014-1025
>>>>>> @@ +1013,14 @@
>>>>>> +  const endianness E = ELFT::TargetEndianness;
>>>>>> +  if (Config->CompressDebugSections == "zlib-gnu") {
>>>>>> +    // 4 bytes magic + size of uncompressed data in big endian.
>>>>>> +    memcpy(Buf, "ZLIB", 4);
>>>>>> +    write64be(Buf + 4, UncompressedSize);
>>>>>> +  } else {
>>>>>> +    // "zlib-gabi" or "zlib".
>>>>>> +    write32<E>(Buf, ELFCOMPRESS_ZLIB);
>>>>>> +    if (ELFT::Is64Bits)
>>>>>> +      write64<E>(Buf + 2 * sizeof(Elf64_Word), UncompressedSize);
>>>>>> +    else
>>>>>> +      write32<E>(Buf + sizeof(Elf32_Word), UncompressedSize);
>>>>>> +  }
>>>>>> +
>>>>>> ----------------
>>>>>> Why we have two types of headers? If one is obsolete, we probably
>>>>>> don't want to implement it.
>>>>>>
>>>>>> ================
>>>>>> Comment at: ELF/Writer.cpp:1161-1163
>>>>>> @@ -1158,2 +1160,5 @@
>>>>>>
>>>>>>  template <class ELFT>
>>>>>> +static bool isCompressionCandidate(InputSectionBase<ELFT> *C,
>>>>>> +                                   StringRef OutSecName) {
>>>>>> +  if (Config->CompressDebugSections == "none")
>>>>>> ----------------
>>>>>> You can move this to OutputSection class, no?
>>>>>>
>>>>>> ================
>>>>>> Comment at: ELF/Writer.cpp:1172
>>>>>> @@ +1171,3 @@
>>>>>> +    return false;
>>>>>> +  return zlib::isAvailable();
>>>>>> +}
>>>>>> ----------------
>>>>>> This condition should be at the beginning of this function.
>>>>>>
>>>>>>
>>>>>> http://reviews.llvm.org/D20211
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160512/6c637f19/attachment.html>