[PATCH] D49678: [llvm-objcopy] Adding support for compressed DWARF debug sections.

Wed Aug 22 12:33:37 PDT 2018

jakehehrlich added a comment.

Ok so a few higher level comments

1. I think I agree with Alex that we shouldn't worry about compressing only if an improvement is shown. It's just far too difficult to do that at the moment. We can come back to that later. This seems to be a new case of "replace section" that I had been aware of the need for before but hand't properly implemented.
2. We should start simple and build up to parity and only build up to parity based on demand. In this case we shouldn't aim for full parity. Just start simple and if that isn't enough we can improve more later. If something isn't ideal that's ok for now.
3. Along those lines, I think we should handle compression and decompression in different diffs. We really need to rain in the complexity here.
4. The whole compressor decompressor thing in libObject feels misguided to me personally. Namely it isn't handling the subtle issue of size and endianess correctly (in fact, I think llvm-objcopy's whole handling of size is currently incorrect but this isn't helping). Luckily for you I have little to no say in the matter because this is outside of llvm-objcopy. Now that I realize this is in libObject (whooooops, my bad guys) I'd like that part to be in a separate change and reviewed by other people. You can cc me or even add me as a reviewer. If I were a reviewer for it I would like to see evidence for its need outside of llvm-objcopy since I don't think it's needed here.
5. This change is simply far more complex that it needs to be to accomplish the task. It's getting better and better though. We've been trying to make micro-pushes in the right direction, and we're getting there, but this really needs to be restructured somewhat instead. For instance compressSections and decompressSections are starting to look like something that should go in the final code, all the argument handling seems good to me as well. The means by which compressed sections are being handled is not in that shape. In order to express the massive change I think I'd like to see I've written the following code:

  class CompressedSection {
  private:
    uint64_t UncompressedSize;
    uint64_t UncompressedAlign;
    uint32_t Type;
    SmallVector<char, 0> CompressedData;
    bool GNU;
  public:
    CompressedSection(Twine NewName, const SectionBase& ToCompress, bool GNU) : GNU(GNU) {
      Name = NewName;
      zlib::compress(toStringRef(ToCompress.OriginalData),  CompressedData);
      Size = CompressedData.size() + <some constant to account for size of header> + (GNU ? 4 : 0); // Needed mostly because I didn't handle Size correctly in llvm-objcopy. Sorry. I plan on getting back to llvm-objcopy to fix many things this included in a few weeks.
      Flags = ToCompress.Flags | SHF_COMPRESS;
      Align = 8; // Also I think alignment also changes depending on size here and an alignment of 4 would be better for the 32-bit case.
      // Assign all the other things from ToCompress except size and align
      UncompressedSize = ToCompress.Size;
      UncompressedAlign = ToCompress.Align;
    }
  }

  ...

  template <class ELFT>
  void ELFSectionWriter<ELFT>::visit(const CompressedSection &Sec) {
    uint8_t *Buf = Out.getBufferStart();
    Buf += Sec.Offset;
    auto Chdr = reinterpret_cast<typename ELFT::Chdr*>(Buf);
    Chdr->ch_type = ELFCOMPRESS_ZLIB;
    Chdr->ch_size = Sec.UncompressedSize;
    Chdr->ch_align = Sec.UncompressedAlign;
    Buf += sizeof(*Chdr)
    if (Sec.GNU) {
      StringRef Magic = "ZLIB";
      Buf = std::copy(Magic.begin(), Magic.end(), Buf);
    }
    std::copy(Sec.CompressedData.begin(), Sec.CompressedData.end(), Buf);
  }

This means you'll not need to mess with ELFT at all beyond that one function, it follows what we've been doing with these sorts of issues, and it eliminates the need for Compressor. Decompression is a different issue mind you. There is currently an issue with how reading works. I want to improve that. In a nutshell though it's going to need to be done in the ELFBuilder.

The major pain point here is just that size has to be overestimated due to poor design on my part. Align also seems to suffer from this issue which isn't something I was previously aware of. This should avoid the need for 90% of the code written.

================
Comment at: lib/Object/Compressor.cpp:17
+namespace llvm {
+namespace object {
+
----------------
jhenderson wrote:
> jakehehrlich wrote:
> > This probably shouldn't be defined in object since that's a library namespace, maybe objcopy instead?
> I nearly made the same mistake myself, but this is actually IN the Object library, so is in the correct namespace!
Ohhhh. Yeah I think I want to put the kibosh on doing this in this change. For the purposes of llvm-objcopy and this change I don't think we want or need this. If this is truly needed at large then someone else should review it.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:355
+  Expected<Decompressor> D = object::Decompressor::create(
+      Section.Name, Contents, (ELFT::TargetEndianness == endianness::little),
+      ELFT::Is64Bits);
----------------
jhenderson wrote:
> jakehehrlich wrote:
> > Again, you don't need to pass around this sort of data. I will not accept a change that unnecessarily keeps track of endianness and bit width using data when an easy ELFT solution exists. I hate ElfType because it forces you to branch on code you otherwise wouldn't have to and errors can occur where those difference branches have different code. 
> My preference would be to refactor or extend the already-existing libObject Decompressor interface to take an ELFT as a template argument. That would allow the complexity to be pushed deeper into the library, and remove some of these issues.
My preference would be to not use the whole Compressor thing. If the CompressedSection type handles these internally then ELFWriter has zero issues handeling this and the normal compression code that already exists in llvm will handle it.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:369
+
+static bool isCompressed(const SectionBase &Section) {
+  return Section.Name.startswith(".zdebug") ||
----------------
jhenderson wrote:
> jakehehrlich wrote:
> > This should probably not use the name as indication of compression. Just use the flag.
> I don't think we have a choice - it appears that GNU-style compression uses the name only as an indicator of compression. I could be persuaded that we should add the SHF_COMPRESSED flag in either case, since relying on section names is wrong and archaic, when we have other mechanisms to communicate the same thing, but we will at least have to be able to handle the case where a compressed section, compressed via GNU's tools, is fed to us.
We have the option to just not support this to be honest. Compressed sections don't really even have a use case as far as I'm concerned. That said I'll not worry about this. That seems like an unfortunate but necessary conclusion of supporting this.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:378
+  std::vector<SectionBase *> Sections;
+  std::for_each(
+      Obj.sections().begin(), Obj.sections().end(),
----------------
jhenderson wrote:
> jakehehrlich wrote:
> > Based on James' comment I might be misunderstanding something here. I think you can just loop over Obj.sections() in a range based for loop. You really don't want to do unnecessary loops over sections.
> Yes, you are missing that we modify Obj.sections() within the loop below, and we can't iterate over a vector and add to it in the same loop.
Ah I see now. In that case I now concur that we should do what you said and add the sections to be added to a vector and add them at the end. Thanks!

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:385
+    if (isCompressed(Section) ||
+        !object::isCompressableSectionName(Section.Name))
+      continue;
----------------
jakehehrlich wrote:
> I think we already have an "is debug section" predicate floating around somewhere that uses the section name and might do another sanity check like making sure it isn't an allocated section. If there is not objectionable reason why this wouldn't work, I'd prefer to use that.
I don't seem to be able to mark my own comment as done. But consider this one done.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:416
+    // Replace this Section with a compressed version.
+    RemovePred = [RemovePred, &Section](const SectionBase &Sec) {
+      return &Sec == &Section || RemovePred(Sec);
----------------
jhenderson wrote:
> jakehehrlich wrote:
> > RemovePred should not be modified in a loop like this. It's going to make RemovePred even more inefficent that it already is.
> I suggested this originally, because I didn't see an obvious alternative, unless we completely drop the idea of "compress only if compression makes it smaller".
I see. I have now gone though a similar set of realizations. We should probably just compress unconditionally.

Repository:
  rL LLVM

https://reviews.llvm.org/D49678