[PATCH] D49206: [llvm-objcopy] Add support for large indexes

Thu Jul 12 15:00:14 PDT 2018

jakehehrlich marked 16 inline comments as done.
jakehehrlich added inline comments.

================
Comment at: llvm/test/tools/llvm-objcopy/auto-remove-shndx.test:1
+RUN: gunzip -c %p/Inputs/many-sections.o.gz > %t
+RUN: llvm-objcopy -R .text -R s0 -R s1 -R s2 -R s3 -R s4 -R s5 -R s6 %t %t2
----------------
jhenderson wrote:
> jakehehrlich wrote:
> > alexshap wrote:
> > > just in case - I don't see any other tests using gunzip - not sure if it's available by default (in particular, on the machines where build bots are running)
> > There is no published list. I'll just have to try it and revert it if it doesn't work. This is the easy case that *might* just work. If this dosn't work then I'll add some python to each of these tests to unzip the file instead.
> I just tested, and my Windows machine doesn't have it, so it's not part of the GnuWin32 tools, which are the requirement.
siiiiigh. I'll rewrite the test to work using python then.

================
Comment at: llvm/test/tools/llvm-objcopy/strict-no-add.test:1-2
+# This test makes sure that sections added at the end that don't have symbols
+# defined in them don't trigger the creation of a large index table.
+
----------------
jhenderson wrote:
> Just musing about this a bit. I wonder if this is the behaviour we should follow. It's certainly correct to do so, but it's unclear from the gABI whether it's okay to generate it under other circumstances. If we think it's okay to, maybe it would be simpler to just always generate the table if there are >= SHN_LORESERVE number of sections?
I think the cases where anyone would care about it too much are few and far between which is why I'm not breaking my back to solve the case where it overly eagerly emits a shndx table by 1 index. I started to write the code for doing this but it just winds up being that we delete some code that I already wrote so why not keep it I guess. If you think the system as a whole is strictly improved by removing that code I'll remove it.

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:705

-  for (const auto &Sym : unwrapOrError(ElfFile.symbols(&Shdr))) {
+  auto Symbols = unwrapOrError(ElfFile.symbols(&Shdr));
+  for (const auto &Sym : Symbols) {
----------------
jhenderson wrote:
> What is the type of Symbols here? I think the LLVM coding style is not to use auto in these situations.
Our options here are

`ElfFile<ELFT>::Elf_Sym_Range`
`typename ELFT::SymRange`
`ArrayRef<typename ELFT::Sym>`

I've presented these in order of unfolding. They're all shorter than I expected. The last is the most clear but also it means that if ElfFile or ELFT changes their SymRange type an issue could arise. I find most of these either verbose or opaque however.

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:706
+  auto Symbols = unwrapOrError(ElfFile.symbols(&Shdr));
+  for (const auto &Sym : Symbols) {
     SectionBase *DefSection = nullptr;
----------------
jhenderson wrote:
> Ditto.
I'm going to push back here because a typename and collon is needed. e.g. there is extra difficulty.

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:726
+          Index,
+          "Symbol '" + Name + "' is defined in invalid section with index " +
+              Twine(Index));
----------------
jhenderson wrote:
> Hmm... not sure about this error message, since the section isn't invalid, rather the section //index// is. So, maybe something like "Symbol 'bar' has an invalid section index 12345".
> 
> I assume that testing this would be unrealistic due to the size of the required test input?
This is actually testable quite easily by uploading a specially constructed binary (which I'm getting pretty good at these days). Basically construct a binary that uses SHN_XINDEX and then puts an overly large index in the shndx table despite there being only 1 symbol. This would require just a bit of yaml2obj and some hex editing. The problem is that constructing such a binary is liable to take me 2-3 hours and I don't want to spend that time just to test one error message. 

================
Comment at: llvm/tools/llvm-objcopy/Object.h:419
+    Name = ".symtab_shndx";
+    OriginalOffset = std::numeric_limits<uint64_t>::max();
+    Align = 4;
----------------
jhenderson wrote:
> I wonder if this should be the default for all SectionBase classes (i.e. initialise the field in the header)?
Sure, I'll go ahead and set it that way, why not.

Repository:
  rL LLVM

https://reviews.llvm.org/D49206