<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 12, 2016 at 2:10 PM, Rui Ueyama <span dir="ltr"><<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div class="h5">On Thu, May 12, 2016 at 2:02 PM, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><div><div>On Thu, May 12, 2016 at 1:02 PM, Rui Ueyama <span dir="ltr"><<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div>On Thu, May 12, 2016 at 11:27 AM, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><div><div>On Thu, May 12, 2016 at 10:07 AM, Rui Ueyama via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">ruiu added a comment.<br>

<br>

Please add tests to verify that you are producing compressed sections that can be read by other tools.<br>

<br>

<br>

================<br>

Comment at: ELF/Options.td:33<br>

@@ +32,3 @@<br>

+def compress_debug_sections : Joined<["--"], "compress-debug-sections=">,<br>

+  HelpText<"Force undefined symbol during linking">;<br>

+<br>

----------------<br>

This help text is out of context.<br>

<br>

================<br>

Comment at: ELF/OutputSections.cpp:833-837<br>

@@ +832,7 @@<br>

+  // compression.<br>

+  if (Config->CompressDebugSections == "zlib-gabi" ||<br>

+    Config->CompressDebugSections == "zlib") {<br>

+    Header.sh_flags |= SHF_COMPRESSED;<br>

+  }<br>

+  else {<br>

+    // Compress all candidate sections using ZLIB compression, using the GNU<br>

----------------<br>

Clang-format.<br>

<br>

================<br>

Comment at: ELF/OutputSections.cpp:842<br>

@@ +841,3 @@<br>

+    // renamed to start with.zdebug to identify the use of compression.<br>

+    StringSaver Saver(ScriptConfig->Alloc);<br>

+    this->Name = Saver.save(Twine(".z") + Name.drop_front(1));<br>

----------------<br>

Please don't use an unrelated object's members.<br>

<br>

================<br>

Comment at: ELF/OutputSections.cpp:859-862<br>

@@ +858,6 @@<br>

+template <class ELFT> void OutputSection<ELFT>::finalizeCompressed() {<br>

+  std::vector<uint8_t> Uncompressed(getSize());<br>

+  uint8_t *Data = Uncompressed.data();<br>

+  for (InputSection<ELFT> *C : Sections)<br>

+    C->writeTo(Data);<br>

+<br>

----------------<br>

This code creates an uncompressed a debug section in memory and then compress it at once. It consumes as much memory as the total size of the debug section. It is probably unacceptable.<br>

<br>

zlib supports "streaming" compression, right? I think you can pass pieces of data to zlib object and let it construct compressed data incrementally. Please do that way so that the maximum memory is bounded to the largest input section size instead of the largest output section size.<br></blockquote><div><br></div></div></div><div>Would the input be decompressed in a streaming manner as well? So even a large input section wouldn't be entirely in-memory at once?<br><br>(presumably you need to compute the size of the section before writing it? So you can write other sections in parallel, etc - in which case you might do a null compression (compressing into /dev/null, of sorts) to compute the size of the compressed data, then repeating the work to actually write it out when needed - not sure whether that'll be faster overall... )<br></div></div></div></div></blockquote><div><br></div></div></div><div>Yes, if we want to write to the file while compressing sections, we need to know each section size in compressed format beforehand. </div></div></div></div></blockquote><div><br></div></div></div><div><div>Ah, I understand your comments earlier better then - you just want to stream in from the various inputs but still expect to end up with the whole compressed contents in memory (so the size can be determined before writing it out to the file).</div></div></div></div></div></blockquote><div><br></div></div></div><div>Correct. I just want to avoid creating a whole uncompressed contents and compress it, if possible.</div></div></div></div></blockquote><div><br></div><div>*nod* agreed, if possible</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div> </div></div><span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Compressing sections twice is probably too costly.</div></div></div></div></blockquote><div><br></div></span><div>I wonder if it's possible to more cheaply compute the compressed size than the compressed contents - probably not much I would guess (without much basis for that guess).<br></div><span><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>We at least need to have an input section in memory.</div></div></div></div></blockquote><div><br></div></span><div>What do you mean by 'in memory' - memory mapping is potentially different from actually allocating memory for it, right? (one might just look like memory, the other is going to be actual memory - at least that appears to be how our internal memory accounting works - the input files are put on tmpfs, memory mapped in but that doesn't "cost" any of the process's memory because it's allocated before/not by the process)</div><span><div></div></span></div></div></div></blockquote><div><br></div></span><div>By in memory, I mean the input section and output section contents need to be represented as a byte array whether they are mmap'ed or not.</div></div></div></div></blockquote><div><br></div><div>The output section does? That seems like it'd add some memory overhead compared to just writing out the parts from the memory mapped inputs into the right offsets in the output. (at least based on my limited experience implementing the DWP tool - which mostly just wants to concatenate sections together) </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div class="h5"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> We are using mmap'ed IO, and in the first step we copy input sections to output sections and then apply relocations to the copied data. That process treats section contents as a byte array.<br></div><span><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>FWIW, I think Google (using gold) mostly just compresses the /input/, but doesn't compress the output on the link. So that's a scenario we care about - I don't know if that particular combination affects the design in any particular way, though.</div></div></div></div></blockquote><div><br></div></span><div>Interesting. I think we don't want to support a feature if it has no user. Maybe we should support decompressing debug sections, at least as the first step?</div></div></div></div></blockquote><div><br></div></span><div>Seems like a good first step to me, but I've no idea what users the author of this patch is trying to support, or what kind of users are out there, etc.</div><div><div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

================<br>

Comment at: ELF/OutputSections.cpp:869-874<br>

@@ +868,8 @@<br>

+<br>

+  // If not successfully compressed or total size is greater than<br>

+  // size before applying compression, then cancel compressed output.<br>

+  size_t TotalSize = Compressed.size() + getCompressionHeaderSize<ELFT>();<br>

+  Compress = Success == zlib::Status::StatusOK && TotalSize < getSize();<br>

+  if (!Compress)<br>

+    return;<br>

+<br>

----------------<br>

I think we don't have to care about this edge case. Unless you are compressing random bits, it usually shrink.<br></blockquote><div><br></div></span><div>Because the compression is per section & some parts of DWARF are pretty compact, not everything is improved by compression - but it takes pretty small samples of debug info to not compress, it's true. I did implement behavior like this (skip compression when not beneficial) in LLVM's integrated assembler support for compressed debug info.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div>

<br>

================<br>

Comment at: ELF/OutputSections.cpp:1014-1025<br>

@@ +1013,14 @@<br>

+  const endianness E = ELFT::TargetEndianness;<br>

+  if (Config->CompressDebugSections == "zlib-gnu") {<br>

+    // 4 bytes magic + size of uncompressed data in big endian.<br>

+    memcpy(Buf, "ZLIB", 4);<br>

+    write64be(Buf + 4, UncompressedSize);<br>

+  } else {<br>

+    // "zlib-gabi" or "zlib".<br>

+    write32<E>(Buf, ELFCOMPRESS_ZLIB);<br>

+    if (ELFT::Is64Bits)<br>

+      write64<E>(Buf + 2 * sizeof(Elf64_Word), UncompressedSize);<br>

+    else<br>

+      write32<E>(Buf + sizeof(Elf32_Word), UncompressedSize);<br>

+  }<br>

+<br>

----------------<br>

Why we have two types of headers? If one is obsolete, we probably don't want to implement it.<br>

<br>

================<br>

Comment at: ELF/Writer.cpp:1161-1163<br>

@@ -1158,2 +1160,5 @@<br>

<br>

 template <class ELFT><br>

+static bool isCompressionCandidate(InputSectionBase<ELFT> *C,<br>

+                                   StringRef OutSecName) {<br>

+  if (Config->CompressDebugSections == "none")<br>

----------------<br>

You can move this to OutputSection class, no?<br>

<br>

================<br>

Comment at: ELF/Writer.cpp:1172<br>

@@ +1171,3 @@<br>

+    return false;<br>

+  return zlib::isAvailable();<br>

+}<br>

----------------<br>

This condition should be at the beginning of this function.<br>

<br>

<br>

<a href="http://reviews.llvm.org/D20211" rel="noreferrer" target="_blank">http://reviews.llvm.org/D20211</a><br>

<br>

<br>

<br></div></div>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>

</blockquote></div><br></div></div>

</blockquote></div></div></div><br></div></div>

</blockquote></div></div></div><br></div></div>

</blockquote></div></div></div><br></div></div>

</blockquote></div><br></div></div>