[PATCH] D20211: [ELF] - Support of compressed debug sections creation(draft).

David Blaikie via llvm-commits llvm-commits at lists.llvm.org
Thu May 12 16:18:07 PDT 2016


On Thu, May 12, 2016 at 2:57 PM, Rui Ueyama <ruiu at google.com> wrote:

> On Thu, May 12, 2016 at 2:40 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Thu, May 12, 2016 at 2:29 PM, Rui Ueyama <ruiu at google.com> wrote:
>>
>>> On Thu, May 12, 2016 at 2:22 PM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, May 12, 2016 at 2:10 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>
>>>>> On Thu, May 12, 2016 at 2:02 PM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 12, 2016 at 1:02 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>>>
>>>>>>> On Thu, May 12, 2016 at 11:27 AM, David Blaikie <dblaikie at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, May 12, 2016 at 10:07 AM, Rui Ueyama via llvm-commits <
>>>>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>>>>
>>>>>>>>> ruiu added a comment.
>>>>>>>>>
>>>>>>>>> Please add tests to verify that you are producing compressed
>>>>>>>>> sections that can be read by other tools.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/Options.td:33
>>>>>>>>> @@ +32,3 @@
>>>>>>>>> +def compress_debug_sections : Joined<["--"],
>>>>>>>>> "compress-debug-sections=">,
>>>>>>>>> +  HelpText<"Force undefined symbol during linking">;
>>>>>>>>> +
>>>>>>>>> ----------------
>>>>>>>>> This help text is out of context.
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/OutputSections.cpp:833-837
>>>>>>>>> @@ +832,7 @@
>>>>>>>>> +  // compression.
>>>>>>>>> +  if (Config->CompressDebugSections == "zlib-gabi" ||
>>>>>>>>> +    Config->CompressDebugSections == "zlib") {
>>>>>>>>> +    Header.sh_flags |= SHF_COMPRESSED;
>>>>>>>>> +  }
>>>>>>>>> +  else {
>>>>>>>>> +    // Compress all candidate sections using ZLIB compression,
>>>>>>>>> using the GNU
>>>>>>>>> ----------------
>>>>>>>>> Clang-format.
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/OutputSections.cpp:842
>>>>>>>>> @@ +841,3 @@
>>>>>>>>> +    // renamed to start with.zdebug to identify the use of
>>>>>>>>> compression.
>>>>>>>>> +    StringSaver Saver(ScriptConfig->Alloc);
>>>>>>>>> +    this->Name = Saver.save(Twine(".z") + Name.drop_front(1));
>>>>>>>>> ----------------
>>>>>>>>> Please don't use an unrelated object's members.
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/OutputSections.cpp:859-862
>>>>>>>>> @@ +858,6 @@
>>>>>>>>> +template <class ELFT> void
>>>>>>>>> OutputSection<ELFT>::finalizeCompressed() {
>>>>>>>>> +  std::vector<uint8_t> Uncompressed(getSize());
>>>>>>>>> +  uint8_t *Data = Uncompressed.data();
>>>>>>>>> +  for (InputSection<ELFT> *C : Sections)
>>>>>>>>> +    C->writeTo(Data);
>>>>>>>>> +
>>>>>>>>> ----------------
>>>>>>>>> This code creates an uncompressed a debug section in memory and
>>>>>>>>> then compress it at once. It consumes as much memory as the total size of
>>>>>>>>> the debug section. It is probably unacceptable.
>>>>>>>>>
>>>>>>>>> zlib supports "streaming" compression, right? I think you can pass
>>>>>>>>> pieces of data to zlib object and let it construct compressed data
>>>>>>>>> incrementally. Please do that way so that the maximum memory is bounded to
>>>>>>>>> the largest input section size instead of the largest output section size.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Would the input be decompressed in a streaming manner as well? So
>>>>>>>> even a large input section wouldn't be entirely in-memory at once?
>>>>>>>>
>>>>>>>> (presumably you need to compute the size of the section before
>>>>>>>> writing it? So you can write other sections in parallel, etc - in which
>>>>>>>> case you might do a null compression (compressing into /dev/null, of sorts)
>>>>>>>> to compute the size of the compressed data, then repeating the work to
>>>>>>>> actually write it out when needed - not sure whether that'll be faster
>>>>>>>> overall... )
>>>>>>>>
>>>>>>>
>>>>>>> Yes, if we want to write to the file while compressing sections, we
>>>>>>> need to know each section size in compressed format beforehand.
>>>>>>>
>>>>>>
>>>>>> Ah, I understand your comments earlier better then - you just want to
>>>>>> stream in from the various inputs but still expect to end up with the whole
>>>>>> compressed contents in memory (so the size can be determined before writing
>>>>>> it out to the file).
>>>>>>
>>>>>
>>>>> Correct. I just want to avoid creating a whole uncompressed contents
>>>>> and compress it, if possible.
>>>>>
>>>>
>>>> *nod* agreed, if possible
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> Compressing sections twice is probably too costly.
>>>>>>>
>>>>>>
>>>>>> I wonder if it's possible to more cheaply compute the compressed size
>>>>>> than the compressed contents - probably not much I would guess (without
>>>>>> much basis for that guess).
>>>>>>
>>>>>>
>>>>>>> We at least need to have an input section in memory.
>>>>>>>
>>>>>>
>>>>>> What do you mean by 'in memory' - memory mapping is potentially
>>>>>> different from actually allocating memory for it, right? (one might just
>>>>>> look like memory, the other is going to be actual memory - at least that
>>>>>> appears to be how our internal memory accounting works - the input files
>>>>>> are put on tmpfs, memory mapped in but that doesn't "cost" any of the
>>>>>> process's memory because it's allocated before/not by the process)
>>>>>>
>>>>>
>>>>> By in memory, I mean the input section and output section contents
>>>>> need to be represented as a byte array whether they are mmap'ed or not.
>>>>>
>>>>
>>>> The output section does? That seems like it'd add some memory overhead
>>>> compared to just writing out the parts from the memory mapped inputs into
>>>> the right offsets in the output. (at least based on my limited experience
>>>> implementing the DWP tool - which mostly just wants to concatenate sections
>>>> together)
>>>>
>>>
>>> If all we have to do is to just copy data from files to an output file,
>>> then non-mmap'ed IO may be faster because I guess it involves fewer context
>>> switches between the user and the kernel. But we need to apply relocations
>>> after copying data, and relocation application is basically a random
>>> accesse to an output section.
>>>
>>
>> Sorry, I'm probably lacking enough context to contribute correctly to
>> this conversation - trying to piece it together.
>>
>> It sounded like what you were saying is that you form the output section,
>> complete with applied relocations, in memory (in a new or malloc'd buffer)
>> first, then write that out. I found this surprising because of the memory
>> overhead it would incur.
>>
>> Instead you could (perhaps you do - and I just misunderstood above) walk
>> the inputs, figure out how big the resulting section would be (& keep a
>> list of where the chunks of it came from/what offsets they should be
>> written to) then just read/write from input section to the right part of
>> the output section. For relocations, you could then walk into the memory
>> mapped output & rewrite the desired bytes (or you could copy the contiguous
>> fragments from the inputs that appear between the relocations - then the
>> relocations themselves)
>>
>
> That is exactly what we are currently doing. Sorry if my description was
> not clear or misleading. We copy section contents from mmap'ed input files
> to a mmap'ed output file, and after that we fix up desired bytes in the
> mmap'ed output section.
>
> What I was suggesting in this review comment is this. In order to create a
> compressed section, this patch creates an in-memory, non-mmap'ed buffer
> whose size is the same as the entire output section. It then writes all
> debug section into it and compress it. That seems a bit costly as we don't
> actually have to create that large buffer. Each input section knows how to
> copy itself to a specified memory address (an input section is agnostic
> about whether the destination is mmap'ed region or not) and how to apply
> relocations to the copied contents. The two are actually just one operation
> in LLD -- which is InputSection::writeTo(uint8_t *Dest) function.
> Therefore, we can call writeTo() for each input section with a smaller
> buffer, append the buffer to a zlib stream, and then reuse the buffer for
> the next section.
>

Ah, OK, thanks for the details, that all makes sense.

(& at some point, if we got fancy, InputSection::writeTo(uint8_t *Dest)
could be changed to be optionally stream-like so it could be used to stream
out to a compressor and avoid the need to manifest the entire input section
in a new buffer - but that'd be an optimization to try later when someone
decided to get around to it)


>
> To bring this back around to the original issue - this would make linking
>> compressed debug info better/good. If you have to concatenate all the debug
>> info together into one big buffer before writing it out that seems
>> unfortunate compared to having the separate decompressed fragments (I
>> suppose you could decompress them all into the same buffer - so there might
>> not be any memory overhead, but you'd need a larger contiguous chunk of
>> memory for it). This becomes even clearer in a non-compressed scenario -
>> where the output is just the concatenation of bits of the input without
>> relocations in much of the data. Having to form the entire output content
>> in a memory buffer first would seem like a lot of overhead to me. (again,
>> this is based on my experience trying to improve the memory overhead of the
>> DWP tool which may not be entirely applicable to general purpose linking
>> for a bunch of reasons)
>>
>>
>>>
>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> We are using mmap'ed IO, and in the first step we copy input
>>>>>>> sections to output sections and then apply relocations to the copied data.
>>>>>>> That process treats section contents as a byte array.
>>>>>>>
>>>>>>> FWIW, I think Google (using gold) mostly just compresses the
>>>>>>>> /input/, but doesn't compress the output on the link. So that's a scenario
>>>>>>>> we care about - I don't know if that particular combination affects the
>>>>>>>> design in any particular way, though.
>>>>>>>>
>>>>>>>
>>>>>>> Interesting. I think we don't want to support a feature if it has no
>>>>>>> user. Maybe we should support decompressing debug sections, at least as the
>>>>>>> first step?
>>>>>>>
>>>>>>
>>>>>> Seems like a good first step to me, but I've no idea what users the
>>>>>> author of this patch is trying to support, or what kind of users are out
>>>>>> there, etc.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> ================
>>>>>>>>> Comment at: ELF/OutputSections.cpp:869-874
>>>>>>>>> @@ +868,8 @@
>>>>>>>>> +
>>>>>>>>> +  // If not successfully compressed or total size is greater than
>>>>>>>>> +  // size before applying compression, then cancel compressed
>>>>>>>>> output.
>>>>>>>>> +  size_t TotalSize = Compressed.size() +
>>>>>>>>> getCompressionHeaderSize<ELFT>();
>>>>>>>>> +  Compress = Success == zlib::Status::StatusOK && TotalSize <
>>>>>>>>> getSize();
>>>>>>>>> +  if (!Compress)
>>>>>>>>> +    return;
>>>>>>>>> +
>>>>>>>>> ----------------
>>>>>>>>> I think we don't have to care about this edge case. Unless you are
>>>>>>>>> compressing random bits, it usually shrink.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Because the compression is per section & some parts of DWARF are
>>>>>>>> pretty compact, not everything is improved by compression - but it takes
>>>>>>>> pretty small samples of debug info to not compress, it's true. I did
>>>>>>>> implement behavior like this (skip compression when not beneficial) in
>>>>>>>> LLVM's integrated assembler support for compressed debug info.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/OutputSections.cpp:1014-1025
>>>>>>>>> @@ +1013,14 @@
>>>>>>>>> +  const endianness E = ELFT::TargetEndianness;
>>>>>>>>> +  if (Config->CompressDebugSections == "zlib-gnu") {
>>>>>>>>> +    // 4 bytes magic + size of uncompressed data in big endian.
>>>>>>>>> +    memcpy(Buf, "ZLIB", 4);
>>>>>>>>> +    write64be(Buf + 4, UncompressedSize);
>>>>>>>>> +  } else {
>>>>>>>>> +    // "zlib-gabi" or "zlib".
>>>>>>>>> +    write32<E>(Buf, ELFCOMPRESS_ZLIB);
>>>>>>>>> +    if (ELFT::Is64Bits)
>>>>>>>>> +      write64<E>(Buf + 2 * sizeof(Elf64_Word), UncompressedSize);
>>>>>>>>> +    else
>>>>>>>>> +      write32<E>(Buf + sizeof(Elf32_Word), UncompressedSize);
>>>>>>>>> +  }
>>>>>>>>> +
>>>>>>>>> ----------------
>>>>>>>>> Why we have two types of headers? If one is obsolete, we probably
>>>>>>>>> don't want to implement it.
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/Writer.cpp:1161-1163
>>>>>>>>> @@ -1158,2 +1160,5 @@
>>>>>>>>>
>>>>>>>>>  template <class ELFT>
>>>>>>>>> +static bool isCompressionCandidate(InputSectionBase<ELFT> *C,
>>>>>>>>> +                                   StringRef OutSecName) {
>>>>>>>>> +  if (Config->CompressDebugSections == "none")
>>>>>>>>> ----------------
>>>>>>>>> You can move this to OutputSection class, no?
>>>>>>>>>
>>>>>>>>> ================
>>>>>>>>> Comment at: ELF/Writer.cpp:1172
>>>>>>>>> @@ +1171,3 @@
>>>>>>>>> +    return false;
>>>>>>>>> +  return zlib::isAvailable();
>>>>>>>>> +}
>>>>>>>>> ----------------
>>>>>>>>> This condition should be at the beginning of this function.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://reviews.llvm.org/D20211
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> llvm-commits mailing list
>>>>>>>>> llvm-commits at lists.llvm.org
>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160512/813e50b3/attachment.html>


More information about the llvm-commits mailing list