[PATCH] D20211: [ELF] - Support of compressed debug sections creation(draft).

Thu May 12 14:40:41 PDT 2016

On Thu, May 12, 2016 at 2:29 PM, Rui Ueyama <ruiu at google.com> wrote:

> On Thu, May 12, 2016 at 2:22 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Thu, May 12, 2016 at 2:10 PM, Rui Ueyama <ruiu at google.com> wrote:
>>
>>> On Thu, May 12, 2016 at 2:02 PM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, May 12, 2016 at 1:02 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>
>>>>> On Thu, May 12, 2016 at 11:27 AM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 12, 2016 at 10:07 AM, Rui Ueyama via llvm-commits <
>>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>>
>>>>>>> ruiu added a comment.
>>>>>>>
>>>>>>> Please add tests to verify that you are producing compressed
>>>>>>> sections that can be read by other tools.
>>>>>>>
>>>>>>>
>>>>>>> ================
>>>>>>> Comment at: ELF/Options.td:33
>>>>>>> @@ +32,3 @@
>>>>>>> +def compress_debug_sections : Joined<["--"],
>>>>>>> "compress-debug-sections=">,
>>>>>>> +  HelpText<"Force undefined symbol during linking">;
>>>>>>> +
>>>>>>> ----------------
>>>>>>> This help text is out of context.
>>>>>>>
>>>>>>> ================
>>>>>>> Comment at: ELF/OutputSections.cpp:833-837
>>>>>>> @@ +832,7 @@
>>>>>>> +  // compression.
>>>>>>> +  if (Config->CompressDebugSections == "zlib-gabi" ||
>>>>>>> +    Config->CompressDebugSections == "zlib") {
>>>>>>> +    Header.sh_flags |= SHF_COMPRESSED;
>>>>>>> +  }
>>>>>>> +  else {
>>>>>>> +    // Compress all candidate sections using ZLIB compression,
>>>>>>> using the GNU
>>>>>>> ----------------
>>>>>>> Clang-format.
>>>>>>>
>>>>>>> ================
>>>>>>> Comment at: ELF/OutputSections.cpp:842
>>>>>>> @@ +841,3 @@
>>>>>>> +    // renamed to start with.zdebug to identify the use of
>>>>>>> compression.
>>>>>>> +    StringSaver Saver(ScriptConfig->Alloc);
>>>>>>> +    this->Name = Saver.save(Twine(".z") + Name.drop_front(1));
>>>>>>> ----------------
>>>>>>> Please don't use an unrelated object's members.
>>>>>>>
>>>>>>> ================
>>>>>>> Comment at: ELF/OutputSections.cpp:859-862
>>>>>>> @@ +858,6 @@
>>>>>>> +template <class ELFT> void
>>>>>>> OutputSection<ELFT>::finalizeCompressed() {
>>>>>>> +  std::vector<uint8_t> Uncompressed(getSize());
>>>>>>> +  uint8_t *Data = Uncompressed.data();
>>>>>>> +  for (InputSection<ELFT> *C : Sections)
>>>>>>> +    C->writeTo(Data);
>>>>>>> +
>>>>>>> ----------------
>>>>>>> This code creates an uncompressed a debug section in memory and then
>>>>>>> compress it at once. It consumes as much memory as the total size of the
>>>>>>> debug section. It is probably unacceptable.
>>>>>>>
>>>>>>> zlib supports "streaming" compression, right? I think you can pass
>>>>>>> pieces of data to zlib object and let it construct compressed data
>>>>>>> incrementally. Please do that way so that the maximum memory is bounded to
>>>>>>> the largest input section size instead of the largest output section size.
>>>>>>>
>>>>>>
>>>>>> Would the input be decompressed in a streaming manner as well? So
>>>>>> even a large input section wouldn't be entirely in-memory at once?
>>>>>>
>>>>>> (presumably you need to compute the size of the section before
>>>>>> writing it? So you can write other sections in parallel, etc - in which
>>>>>> case you might do a null compression (compressing into /dev/null, of sorts)
>>>>>> to compute the size of the compressed data, then repeating the work to
>>>>>> actually write it out when needed - not sure whether that'll be faster
>>>>>> overall... )
>>>>>>
>>>>>
>>>>> Yes, if we want to write to the file while compressing sections, we
>>>>> need to know each section size in compressed format beforehand.
>>>>>
>>>>
>>>> Ah, I understand your comments earlier better then - you just want to
>>>> stream in from the various inputs but still expect to end up with the whole
>>>> compressed contents in memory (so the size can be determined before writing
>>>> it out to the file).
>>>>
>>>
>>> Correct. I just want to avoid creating a whole uncompressed contents and
>>> compress it, if possible.
>>>
>>
>> *nod* agreed, if possible
>>
>>
>>>
>>>
>>>>
>>>>
>>>>> Compressing sections twice is probably too costly.
>>>>>
>>>>
>>>> I wonder if it's possible to more cheaply compute the compressed size
>>>> than the compressed contents - probably not much I would guess (without
>>>> much basis for that guess).
>>>>
>>>>
>>>>> We at least need to have an input section in memory.
>>>>>
>>>>
>>>> What do you mean by 'in memory' - memory mapping is potentially
>>>> different from actually allocating memory for it, right? (one might just
>>>> look like memory, the other is going to be actual memory - at least that
>>>> appears to be how our internal memory accounting works - the input files
>>>> are put on tmpfs, memory mapped in but that doesn't "cost" any of the
>>>> process's memory because it's allocated before/not by the process)
>>>>
>>>
>>> By in memory, I mean the input section and output section contents need
>>> to be represented as a byte array whether they are mmap'ed or not.
>>>
>>
>> The output section does? That seems like it'd add some memory overhead
>> compared to just writing out the parts from the memory mapped inputs into
>> the right offsets in the output. (at least based on my limited experience
>> implementing the DWP tool - which mostly just wants to concatenate sections
>> together)
>>
>
> If all we have to do is to just copy data from files to an output file,
> then non-mmap'ed IO may be faster because I guess it involves fewer context
> switches between the user and the kernel. But we need to apply relocations
> after copying data, and relocation application is basically a random
> accesse to an output section.
>

Sorry, I'm probably lacking enough context to contribute correctly to this
conversation - trying to piece it together.

It sounded like what you were saying is that you form the output section,
complete with applied relocations, in memory (in a new or malloc'd buffer)
first, then write that out. I found this surprising because of the memory
overhead it would incur.

Instead you could (perhaps you do - and I just misunderstood above) walk
the inputs, figure out how big the resulting section would be (& keep a
list of where the chunks of it came from/what offsets they should be
written to) then just read/write from input section to the right part of
the output section. For relocations, you could then walk into the memory
mapped output & rewrite the desired bytes (or you could copy the contiguous
fragments from the inputs that appear between the relocations - then the
relocations themselves)

To bring this back around to the original issue - this would make linking
compressed debug info better/good. If you have to concatenate all the debug
info together into one big buffer before writing it out that seems
unfortunate compared to having the separate decompressed fragments (I
suppose you could decompress them all into the same buffer - so there might
not be any memory overhead, but you'd need a larger contiguous chunk of
memory for it). This becomes even clearer in a non-compressed scenario -
where the output is just the concatenation of bits of the input without
relocations in much of the data. Having to form the entire output content
in a memory buffer first would seem like a lot of overhead to me. (again,
this is based on my experience trying to improve the memory overhead of the
DWP tool which may not be entirely applicable to general purpose linking
for a bunch of reasons)

>
>
>>
>>>
>>>
>>>>
>>>>
>>>>> We are using mmap'ed IO, and in the first step we copy input sections
>>>>> to output sections and then apply relocations to the copied data. That
>>>>> process treats section contents as a byte array.
>>>>>
>>>>> FWIW, I think Google (using gold) mostly just compresses the /input/,
>>>>>> but doesn't compress the output on the link. So that's a scenario we care
>>>>>> about - I don't know if that particular combination affects the design in
>>>>>> any particular way, though.
>>>>>>
>>>>>
>>>>> Interesting. I think we don't want to support a feature if it has no
>>>>> user. Maybe we should support decompressing debug sections, at least as the
>>>>> first step?
>>>>>
>>>>
>>>> Seems like a good first step to me, but I've no idea what users the
>>>> author of this patch is trying to support, or what kind of users are out
>>>> there, etc.
>>>>
>>>>
>>>>>
>>>>>
>>>>>> ================
>>>>>>> Comment at: ELF/OutputSections.cpp:869-874
>>>>>>> @@ +868,8 @@
>>>>>>> +
>>>>>>> +  // If not successfully compressed or total size is greater than
>>>>>>> +  // size before applying compression, then cancel compressed
>>>>>>> output.
>>>>>>> +  size_t TotalSize = Compressed.size() +
>>>>>>> getCompressionHeaderSize<ELFT>();
>>>>>>> +  Compress = Success == zlib::Status::StatusOK && TotalSize <
>>>>>>> getSize();
>>>>>>> +  if (!Compress)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> ----------------
>>>>>>> I think we don't have to care about this edge case. Unless you are
>>>>>>> compressing random bits, it usually shrink.
>>>>>>>
>>>>>>
>>>>>> Because the compression is per section & some parts of DWARF are
>>>>>> pretty compact, not everything is improved by compression - but it takes
>>>>>> pretty small samples of debug info to not compress, it's true. I did
>>>>>> implement behavior like this (skip compression when not beneficial) in
>>>>>> LLVM's integrated assembler support for compressed debug info.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> ================
>>>>>>> Comment at: ELF/OutputSections.cpp:1014-1025
>>>>>>> @@ +1013,14 @@
>>>>>>> +  const endianness E = ELFT::TargetEndianness;
>>>>>>> +  if (Config->CompressDebugSections == "zlib-gnu") {
>>>>>>> +    // 4 bytes magic + size of uncompressed data in big endian.
>>>>>>> +    memcpy(Buf, "ZLIB", 4);
>>>>>>> +    write64be(Buf + 4, UncompressedSize);
>>>>>>> +  } else {
>>>>>>> +    // "zlib-gabi" or "zlib".
>>>>>>> +    write32<E>(Buf, ELFCOMPRESS_ZLIB);
>>>>>>> +    if (ELFT::Is64Bits)
>>>>>>> +      write64<E>(Buf + 2 * sizeof(Elf64_Word), UncompressedSize);
>>>>>>> +    else
>>>>>>> +      write32<E>(Buf + sizeof(Elf32_Word), UncompressedSize);
>>>>>>> +  }
>>>>>>> +
>>>>>>> ----------------
>>>>>>> Why we have two types of headers? If one is obsolete, we probably
>>>>>>> don't want to implement it.
>>>>>>>
>>>>>>> ================
>>>>>>> Comment at: ELF/Writer.cpp:1161-1163
>>>>>>> @@ -1158,2 +1160,5 @@
>>>>>>>
>>>>>>>  template <class ELFT>
>>>>>>> +static bool isCompressionCandidate(InputSectionBase<ELFT> *C,
>>>>>>> +                                   StringRef OutSecName) {
>>>>>>> +  if (Config->CompressDebugSections == "none")
>>>>>>> ----------------
>>>>>>> You can move this to OutputSection class, no?
>>>>>>>
>>>>>>> ================
>>>>>>> Comment at: ELF/Writer.cpp:1172
>>>>>>> @@ +1171,3 @@
>>>>>>> +    return false;
>>>>>>> +  return zlib::isAvailable();
>>>>>>> +}
>>>>>>> ----------------
>>>>>>> This condition should be at the beginning of this function.
>>>>>>>
>>>>>>>
>>>>>>> http://reviews.llvm.org/D20211
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160512/abc68cf7/attachment.html>