[PATCH] D49678: [llvm-objcopy] Adding support for compressed DWARF debug sections.

Fri Aug 31 02:02:21 PDT 2018

jhenderson added a comment.

Unfortunately, I'm going to be on annual leave as of next week, so I won't be in a position to review anything further until a week on Monday. As long as my outstanding points are addressed, I don't mind somebody else giving approval. I'll take a look at the final version when I get back, and can always request post-commit changes. Just make sure to include the "Differential Revision" part in the commit message so that this diff gets automatically updated with the committed version.

================
Comment at: test/tools/llvm-objcopy/compress-debug-sections-zlib-gnu.test:32
+# NOTE: Zlib compression makes things worse in this case.
+# SIZE-ORIGINAL: 637
+# SIZE-COMPRESSED: 656
----------------
jhenderson wrote:
> jhenderson wrote:
> > I think it probably makes more sense to get the section size and compare that, rather than the overall file size. You can drop the "wc" calls that way.
> Don't do it this way. You can use the section size field directly (available in the sections dump from earlier), rather than extracting the sections.
> I think it probably makes more sense to get the section size and compare that, rather than the overall file size. You can drop the "wc" calls that way.

> Don't do it this way. You can use the section size field directly (available in the sections dump from earlier), rather than extracting the sections.
Why have you marked these as done? They aren't done.

================
Comment at: test/tools/llvm-objcopy/compress-debug-sections-zlib-gnu.test:7-8
+# RUN: llvm-objdump -s %t.o -section=.debug_foo | FileCheck %s
+# RUN: llvm-objdump -s %t-compressed.o | FileCheck %s --check-prefix=CHECK-COMPRESSED
+# RUN: llvm-readobj -s %t-compressed.o | FileCheck %s --check-prefix=CHECK-FLAGS
+
----------------
plotfi wrote:
> jhenderson wrote:
> > jhenderson wrote:
> > > Is there a particular reason you aren't doing the two checks in the same run of FileCheck?
> > Make that four... I think you should be able to simplify the number of runs down by using multiple switches in the same command-line (e.g. -relocations and -s in llvm-readobj).
> Different tools?
I missed that originally, but you still don't need two calls to each tool. One each would suffice, assuming that the information we want isn't available in any single dump available via one of those. For example, you can do:
```
# RUN: llvm-readobj -relocations -s %t-compressed.o
```
to get the relocations and section headers.

================
Comment at: tools/llvm-objcopy/Object.h:17
 #include "llvm/BinaryFormat/ELF.h"
+#include "llvm/MC/MCTargetOptions.h"
 #include "llvm/MC/StringTableBuilder.h"
----------------
alexshap wrote:
> jhenderson wrote:
> > jhenderson wrote:
> > > plotfi wrote:
> > > > jhenderson wrote:
> > > > > You can get rid of this from the header by forward-declaring the DebugCompressionType enum.
> > > > You can not. Usage of DebugCompressionType::GNU etc makes this not possible unless you want to copy the enum class (which I would like not to).
> > > You aren't using them in the header file, as far as I can see. Move the include into the .cpp if it is needed there. Same goes for Compression.h.
> > You should still forward declare the enum in the header and include MCTargetOptions.h in the cpp file.
> @jhenderson - khm, if we forward declare the enum the size of the enum is unknown (if the definition is not available), consequently, every single .cpp which includes Object.h will have to include MCTargetOptions.h. Just curious - what are the benefits of this / are there any pointers to the llvm style guide or any other docs suggesting this ? that would look kinda broken (one header (Object.h) implicitly depends on another one and can not be included without causing errors into an empty .cpp). 
> if we forward declare the enum the size of the enum is unknown (if the definition is not available), consequently, every single .cpp which includes Object.h will have to include MCTargetOptions.h.
Not true. This is an `enum class` which can be forward declared (defaults to `int` size unless explicitly stated). See https://en.cppreference.com/w/cpp/language/enum. Only clients of the code that actually want to access an enumeration value need to include the header.

It's bad practice to include more headers than needed in general, due to the increased compile time cost, and also creation of implicit dependencies that later could result in broken builds.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:426
+
+    for (auto &Sec : Obj.sections()) {
+      RelocationSection *R = dyn_cast<RelocationSection>(&Sec);
----------------
plotfi wrote:
> jhenderson wrote:
> > This is a bad way to do this. It means you'll iterate over the whole set of sections once for each compressed section again, just looking for a specific relocation section that may not even exist.
> > 
> > I think the better way to do it would be to make your vector a map, mapping relocation sections to their debug section, and then it can be done in the same loop as above, where you are already iterating over all sections. Something like this partial pseudo-code:
> > 
> > ```
> > Map<SectionBase *, SmallVector<SectionBase*, 1>> ToCompress;
> > for (auto &Sec : Obj.sections()) {
> >   if (shouldCompress(Sec))
> >     ToCompress.insert(Sec);
> >   else if (auto RelocSec = dyn_cast<RelocationSection>(Sec)) {
> >     auto TargetSection = RelocSec->getSection();
> >     if (shouldCompress(TargetSection))
> >       ToCompress[TargetSection].push_back(RelocSection);
> >   }
> > }
> > ```
> > Two things to watch out for: 1) it is technically legal to have multiple relocation sections targeting the same section, so it needs to be a vector not a single section. 2) It is possible for the relocation section to be before or after the target section in the section list.
> For now I've added a more simple solution. In the top-level loop I collect the relocation sections that point to compresable debug sections and then in the next loop's inner loop I check for the relocation.
I suppose this works, and it shouldn't cause a performance issue overall, but it seems clunky to do it this way, since you already have this information calculated in the first loop. You're just recalculating it again effectively.

Repository:
  rL LLVM

https://reviews.llvm.org/D49678