[PATCH] D49678: [llvm-objcopy] Adding support for compressed DWARF debug sections.

Mon Aug 20 02:22:57 PDT 2018

jhenderson added inline comments.

================
Comment at: test/tools/llvm-objcopy/Inputs/compress-debug-sections.yaml:1
+--- !ELF
+FileHeader:
----------------
plotfi wrote:
> jhenderson wrote:
> > plotfi wrote:
> > > plotfi wrote:
> > > > jhenderson wrote:
> > > > > Are you planning any more updates to this test? Because I still don't see why you need such a large section.
> > > > > 
> > > > > You also definitely don't need the GNU stack section, or the symbols.
> > > > > 
> > > > > I think that one thing that is important is having relocations patching the debug section, because these need to point at the decompressed offset, not something else. You should add one that patches somewhere in the middle of your (small) section, and another that uses a section-relative reference and addend, referencing the compressed section.
> > > > > 
> > > > > Finally, you should have testing that shows that non-debug sections are not compressed.
> > > > Ah, alright. I'll add back the comment section. I made the content somewhat large so that the compression would actually make it smaller (and not bail out). I can also add the relocation sections back too. I had trouble removing the stack section. I'll try and remove it through llvm-objcopy.
> > > James are you saying I need to also manually update sections like .rela.debug_line ??
> > > James are you saying I need to also manually update sections like .rela.debug_line ??
> > I get the impression that you may be thinking you need to have a valid set of debug sections etc for this test. But that isn't important. Your test needs to do the bare minimum in order to show that the new functionality works. This means the input requires:
> >   - A section called ".debug*". This doesn't need to have any special flags set or anything, just enough contents to trigger compression properly.
> >   - A relocation section patching that section. This only needs to have a single relocation, and it can patch any part of the section, that refers to somewhere outside the section. It's quite easy for yaml2obj to create such a relocation section. Take a look at some examples elsewhere in the testsuite.
> >   - A third section that "refers" to the other section. In practice, this only needs to have about 4 bytes of content.
> >   - A relocation section patching this third section, with a single relocation that refers to the debug section.
> > 
> > Then your test uses compress-debug-sections to show that the size is less, and that the relocations still have the correct offsets and addends, and that decompress-debug-sections does the reverse (and again the relocations don't change).
> > 
> > > I had trouble removing the stack section.
> > What do you mean by this? There shouldn't be any issue in deleting the 4 lines that define .note.GNU-stack in your yaml below.
> > 
> > > I'll add back the comment section.
> > Don't think of the sections you add as "the comment section" or "the debug string section". Think of them more like "a dummy debug section for compression" and "some other section", e.g. ".debug_foo" and ".bar". Take a look at some other llvm-objcopy tests, e.g. to do with --strip-debug. You'll see what I'm talking about hopefully.
> > 
> > > I made the content somewhat large so that the compression would actually make it smaller (and not bail out).
> > Okay, I see. I suspect that this is larger than necessary then. Try just using a smaller set of contents, just all 0s. I suspect you won't need many to get compression to work. Even if you need a fair number, just use all 0s anyway. It makes it obvious that the actual contents aren't important.
> So, the complication I had was due to a bug on my part which I fixed. Now the test only consists of two sections (one .debug_foo and one .notdebug_foo) which is small and tests compression of the .debug section and non-compression of the other section. The section contents is literally just some words I copied from someplace. Is there any more that I need to do to reduce this? 
It's marginally better, but the section contents still look more complicated than necessary. Why do you need anything other than all zeroes? Why do you have the section flags set?

You also haven't addressed my point about relocations.

================
Comment at: test/tools/llvm-objcopy/compress-debug-sections-zlib-gnu.test:22
+# CHECK-FLAGS: Name: .zdebug_str (1)
+# CHECK-FLAGS-NOT: Flags [ (0x800)
+# CHECK-FLAGS-NOT: SHF_COMPRESSED (0x800)
----------------
plotfi wrote:
> jhenderson wrote:
> > Change this to:
> > 
> > ```
> > # CHECK-FLAGS: Flags [
> > # CHECK-FLAGS-NOT: SHF_COMPRESSED
> > # CHECK-FLAGS: ]
> > ```
> > FileCheck doesn't match full lines unless you request it. The numbers aren't important.
> Is what I did good enough?
No, please do as I suggested. It's clearer what is intended, and less likely to cause incorrect results.

================
Comment at: test/tools/llvm-objcopy/compress-debug-sections-zlib-gnu.test:27
+# CHECK-FLAGS: Name: .notdebug_foo
+# CHECK-FLAGS: Flags [ (0x30)
+
----------------
I'd prefer it if this explicitly listed the section flags, without the number:
```
# CHECK-FLAGS: Name: .notdebug_foo
# CHECK-FLAGS: Flags [
# CHECK-FLAGS-NEXT: SHF_MERGE // or whatever
# CHECK-FLAGS-NEXT: SHF_STRINGS // or whatever
# CHECK-FLAGS-NEXT: ]
```
What could happen here is that if .notdebug_foo's flags changed, this could match the flags of a later section incorrectly.

================
Comment at: test/tools/llvm-objcopy/compress-debug-sections.test:14
+# RUN: llvm-objdump -s -section=.debug_str - < %t2.o > %t2.txt
+# RUN: llvm-objdump -s -section=.debug_str - < %t3.o > %t3.txt
+
----------------
I think you meant to dump .debug_foo here...

================
Comment at: test/tools/llvm-objcopy/compress-debug-sections.test:11
+
+# RUN: llvm-objdump -s -section=.debug_str - < %t.o  > %t.txt
+# RUN: llvm-objdump -s -section=.debug_str - < %t2.o > %t2.txt
----------------
plotfi wrote:
> jhenderson wrote:
> > plotfi wrote:
> > > jhenderson wrote:
> > > > You don't need to use "-" and "<". In this case, it probably makes more sense to simply pass in the object file directly i.e:
> > > > 
> > > > ```
> > > > # RUN: llvm-objdump -s -section=.debug_str %t.o > %t.txt
> > > > ```
> > > > 
> > > > But I think you could also do something like:
> > > > ```
> > > > # RUN: llvm-objcopy --decompress-debug-sections %tz.o - | \
> > > > # RUN:    llvm-objdump -s -section=.debug_str - > %t2.txt
> > > > ```
> > > I do this using redirects so that the filename doesn't get printed. Otherwise I have to grep -v "file format ELF64-x86-64" before diffing.
> > > 
> > Hmmm... should %t.o be different to %t2.o or %t3.o? I'd expect them to be identical. Otherwise, something might have gone wrong. If they're identical, then you don't need to dump the section directly.
> > 
> > You can avoid the filenames being different by doing the objdump step after each decompress step, and name the output files the same in the two decompress steps.
> I don't understand this preference over -o - <. Is there some issue using this form of redirect on some systems? 
I'm not specifically aware of any, but it'll make the test faster to do the second suggestion in my original comment. It's also less confusing.

================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:377
+                             SectionPred &RemovePred) {
+  std::vector<SectionBase *> Sections;
+  std::for_each(
----------------
Okay, I see what you're doing, but I'd prefer it if you just added the new sections to a vector and then add them all to the object at the end. The number of sections in an object file can be in the hundreds of thousands, potentially more, for objects compiled with -ffunction-sections/-fdata-sections, whereas the number of sections that are to be compressed will be close to or in single figures.

Same applies to `decompressSections`.

Repository:
  rL LLVM

https://reviews.llvm.org/D49678