[PATCH] D49678: [llvm-objcopy] Adding support for compressed DWARF debug sections.

Tue Aug 21 17:20:48 PDT 2018

jakehehrlich added a comment.

This is looking better!

================
Comment at: lib/Object/Compressor.cpp:17
+namespace llvm {
+namespace object {
+
----------------
This probably shouldn't be defined in object since that's a library namespace, maybe objcopy instead?

================
Comment at: lib/Object/Compressor.cpp:27
+bool isCompressableSectionName(StringRef Name) {
+  return (Name.startswith(".debug") || Name.startswith(".zdebug"));
+}
----------------
What's the reasoning behind why .zdebug should be compressible? I would assume that in most cases it's already compressed and that some issue has occurred.

================
Comment at: lib/Object/Compressor.cpp:32-36
+  if (!isCompressableSectionName(Name)) {
+    return createStringError(llvm::errc::invalid_argument,
+                             "Invalid debug section name: %s.",
+                             Name.str().c_str());
+  }
----------------
I think this function should convert strings that start with ".debug" to start with ".zdebug" and everything else should be left alone. An assertion that the name be a compressible section name (or DebugSectionName if that works) can then just be added. No need for Expected.

================
Comment at: lib/Object/Compressor.cpp:48
+template <typename ELFT>
+Error produceZLibHeader(support::endian::Writer &W, uint64_t DecompressedSize,
+                        unsigned Align, DebugCompressionType CompressionType) {
----------------
This whole function isn't needed if you do the header writing in ELFSection<ELFT>::visit and it eliminates a huge complexity from this change, you won't need to embed the header in the data of the section. That's causing *all* of the issues with ELFT in this change.

The GNU ZLIB magic part can also be added there.

================
Comment at: lib/Object/Compressor.cpp:64
+
+  using Chdr = Elf_Chdr_Impl<ELFT>;
+  W.write(static_cast<decltype(Chdr::ch_type)>(ELF::ELFCOMPRESS_ZLIB));
----------------
please use typename ELFT::Chdr

================
Comment at: lib/Object/Compressor.cpp:73
+template <typename ELFT>
+Expected<std::vector<char>> compress(StringRef Contents, uint64_t Align,
+                                     DebugCompressionType CompressionType) {
----------------
This should probably take ArrayRef<uint8_t> to avoid all those conversion being made. The conversion can just be made here instead.

================
Comment at: lib/Object/Compressor.cpp:75
+                                     DebugCompressionType CompressionType) {
+  SmallVector<char, 128> CompressedContents;
+  raw_svector_ostream OS(CompressedContents);
----------------
Ah so I think I may have told you to use std::vector in the past. I didn't realize the zlib library forced you to use SmallVector. In that case SmallVector<char, 0> and just return an Expected<SmallVector<char, 0>> to avoid the copy.

This function doesn't need to worry about the headers or the GNU stuff and doesn't need to worry about the alignment, compression type, or endianness. To be honest I think you can just inline a call to zlib::compress inside of CompressedSection instead.

================
Comment at: tools/llvm-objcopy/Object.cpp:131
   uint8_t *Buf = Out.getBufferStart() + Sec.Offset;
-  std::copy(std::begin(Sec.Contents), std::end(Sec.Contents), Buf);
+  std::copy(std::begin(Sec.OriginalData), std::end(Sec.OriginalData), Buf);
 }
----------------
This should use Data not Original Data. That is to say this line should not change at all.

================
Comment at: tools/llvm-objcopy/Object.cpp:146
+template <class ELFT>
+void ELFSectionWriter<ELFT>::visit(const CompressedSection &Sec) {
+  uint8_t *Buf = Out.getBufferStart();
----------------
This is where you should be handling both the header and the GNU ZLIB magic stuff. This will solve all issues with ELFT

================
Comment at: tools/llvm-objcopy/Object.h:336
+protected:
+  std::string OwnedName;
   std::vector<uint8_t> Data;
----------------
plotfi wrote:
> jakehehrlich wrote:
> > So it turns out that we want to make names std::string instead of StringRefs anyway so this is no longer needed. (two other changes also needed this). That will likely be landing soon.
> ETA?
I think that's happening in this change: https://reviews.llvm.org/D50463

I think Paul is finished with GSoC so it's up to Paul. You can go ahead and use code from that change in this change so that the rebase will be minimal.

================
Comment at: tools/llvm-objcopy/Object.h:320
 public:
-  explicit Section(ArrayRef<uint8_t> Data) : Contents(Data) {}
+  explicit Section(ArrayRef<uint8_t> Data) { OriginalData = Data; }

----------------
This is not how this should work. OriginalData should be set with the other original fields. Anyway, I believe if you rebase the change has already landed. You certinally should not be changing Section in anyway in this change.

================
Comment at: tools/llvm-objcopy/Object.h:328

 class OwnedDataSection : public SectionBase {
   MAKE_SEC_WRITER_FRIEND
----------------
I think you should just not change OwnedDataSection. It has a very small amount of functionality that isn't worth trying to inherit.

================
Comment at: tools/llvm-objcopy/Object.h:338
       : Data(std::begin(Data), std::end(Data)) {
-    Name = SecName;
+    OriginalData = ArrayRef<uint8_t>(Data.data(), Data.size());
+    OwnedName = SecName.str();
----------------
This is not how OriginalData should work. It's implemented correctly upstream and you won't need to do anything in this function after std::string is used for Name.

================
Comment at: tools/llvm-objcopy/Object.h:349

+class CompressedSection : public OwnedDataSection {
+  MAKE_SEC_WRITER_FRIEND
----------------
I think you'll find it easier to write this if you don't inherit from OwnedDataSection and instead reimplement the functionality yourself using an internal SmallVector<char, 0>. You won't need changes to OwnedDataSection and you can avoid every single compressed data copy.

================
Comment at: tools/llvm-objcopy/Object.h:358
+public:
+  CompressedSection(StringRef NewName, ArrayRef<uint8_t> Data,
+                    const SectionBase &Sec)
----------------
I think this would work best if it called zlib::compress directly and just OriginalData can be used. That removes all copies, potential sources of errors, and allows all of the compression code to virtually vanish since the other isues can be taken care of at write time.

You'll also need to store if it's GNU style or not but you won't need the 'Data' argument

================
Comment at: tools/llvm-objcopy/Object.h:361-364
+    Align = Sec.Align;
+    Flags |= Sec.Flags;
+    if (!isGnuStyle())
+      Flags |= ELF::SHF_COMPRESSED;
----------------
I think more than this needs to be copied over like entry size and all those other fields. Please comments and cite the standard about how compressed sections should work here.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:303

+template <typename ELFT>
 static std::unique_ptr<Writer> createWriter(const CopyConfig &Config,
----------------
Lets not make changes to how createWriter or anything like that works in this change. Also please rebase the latest changes onto this patch. This was all changed for necessary reasons recently and I'm not sure this current patch jives with what happened.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:351
+decompress(const SectionBase &Section, const CopyConfig &Config) {
+  StringRef Contents(
+      reinterpret_cast<const char *>(Section.OriginalData.data()),
----------------
you can use toStringRef for this I believe (just found out about it myself).

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:355
+  Expected<Decompressor> D = object::Decompressor::create(
+      Section.Name, Contents, (ELFT::TargetEndianness == endianness::little),
+      ELFT::Is64Bits);
----------------
Again, you don't need to pass around this sort of data. I will not accept a change that unnecessarily keeps track of endianness and bit width using data when an easy ELFT solution exists. I hate ElfType because it forces you to branch on code you otherwise wouldn't have to and errors can occur where those difference branches have different code. 

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:366
+
+  return std::move(DecompressedContents);
+}
----------------
1) you don't need this move here. C++ will automatically insert the move for you because DecompressedContents is local.
2) You don't need to use a unique pointer. SmallVectors have move semantics.
3) 128 is quite a large size and you're allocating it on the heap by using a unique_ptr anyway.

I think the solution which is going to work out best after we go though a few more rounds of review will be to return an Expected<SmallVector<char, 0>>. The part that makes me a bit iffy is that presumably you want to use an OwnedDataSection for this but that uses an std::vector internally. It might make sense to make that use a SmallVector<char, 0>. It's unclear.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:369
+
+static bool isCompressed(const SectionBase &Section) {
+  return Section.Name.startswith(".zdebug") ||
----------------
This should probably not use the name as indication of compression. Just use the flag.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:378
+  std::vector<SectionBase *> Sections;
+  std::for_each(
+      Obj.sections().begin(), Obj.sections().end(),
----------------
Based on James' comment I might be misunderstanding something here. I think you can just loop over Obj.sections() in a range based for loop. You really don't want to do unnecessary loops over sections.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:385
+    if (isCompressed(Section) ||
+        !object::isCompressableSectionName(Section.Name))
+      continue;
----------------
I think we already have an "is debug section" predicate floating around somewhere that uses the section name and might do another sanity check like making sure it isn't an allocated section. If there is not objectionable reason why this wouldn't work, I'd prefer to use that.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:388
+
+    StringRef Contents(
+        reinterpret_cast<const char *>(Section.OriginalData.data()),
----------------
ditto on toStringRef

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:416
+    // Replace this Section with a compressed version.
+    RemovePred = [RemovePred, &Section](const SectionBase &Sec) {
+      return &Sec == &Section || RemovePred(Sec);
----------------
RemovePred should not be modified in a loop like this. It's going to make RemovePred even more inefficent that it already is.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:431
+
+  for (auto S : Sections) {
+    const SectionBase &Section = *S;
----------------
Same thing here, I think you should just loop over Obj.sections()

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:456
+    // Replace this Section with a decompressed version.
+    RemovePred = [RemovePred, &Section](const SectionBase &Sec) {
+      return &Sec == &Section || RemovePred(Sec);
----------------
ditto

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:469
 // system. The only priority is that keeps/copies overrule removes.
+template <typename ELFT>
 static void handleArgs(const CopyConfig &Config, Object &Obj,
----------------
handleArgs should never be ELFT specific TBH. Also we removed splitDWOToFile in a recent change.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:719
+template <typename ELFT>
+static bool handleArgsAndWrite(const CopyConfig &Config, Object &Obj,
+                               const Reader &Reader, Binary &Binary,
----------------
I kind of like the rough idea of a function that encapsulates the ELFT part of reading and uses it to create the correct writer. This isn't precisely how I think it should work however. This sort of code might let us get rid of ElfType however.

For now please don't change this. It is not needed for this change.

Repository:
  rL LLVM

https://reviews.llvm.org/D49678