[PATCH] D49206: [llvm-objcopy] Add support for large indexes

Thu Jul 12 04:57:51 PDT 2018

jhenderson added a comment.

I might be going overboard with the comments around "auto", and I didn't highlight all of the ones I spotted. But here's the link to the corresponding style point: https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable. Feel free to ignore what I said if you think they're valid uses.

================
Comment at: llvm/test/tools/llvm-objcopy/auto-remove-shndx.test:1
+RUN: gunzip -c %p/Inputs/many-sections.o.gz > %t
+RUN: llvm-objcopy -R .text -R s0 -R s1 -R s2 -R s3 -R s4 -R s5 -R s6 %t %t2
----------------
jakehehrlich wrote:
> alexshap wrote:
> > just in case - I don't see any other tests using gunzip - not sure if it's available by default (in particular, on the machines where build bots are running)
> There is no published list. I'll just have to try it and revert it if it doesn't work. This is the easy case that *might* just work. If this dosn't work then I'll add some python to each of these tests to unzip the file instead.
I just tested, and my Windows machine doesn't have it, so it's not part of the GnuWin32 tools, which are the requirement.

================
Comment at: llvm/test/tools/llvm-objcopy/remove-shndx.test:2
+RUN: gunzip -c %p/Inputs/many-sections.o.gz > %t
+RUN: llvm-objcopy -R .symtab_shndxr %t %t2
+RUN: llvm-readobj -sections %t2 | FileCheck %s
----------------
Would you mind first putting in a separate patch to LLVM to rename .symtab_shndxr to .symtab_shndx, which is the standard name? I'd rather not propagate the incorrect name further.

================
Comment at: llvm/test/tools/llvm-objcopy/strict-no-add.test:1-2
+# This test makes sure that sections added at the end that don't have symbols
+# defined in them don't trigger the creation of a large index table.
+
----------------
Just musing about this a bit. I wonder if this is the behaviour we should follow. It's certainly correct to do so, but it's unclear from the gABI whether it's okay to generate it under other circumstances. If we think it's okay to, maybe it would be simpler to just always generate the table if there are >= SHN_LORESERVE number of sections?

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:164
+  uint8_t *Buf = Out.getBufferStart() + Sec.Offset;
+  auto *Indexes = reinterpret_cast<typename ELFT::Word *>(Buf);
+  std::copy(std::begin(Sec.Indexes), std::end(Sec.Indexes), Indexes);
----------------
I'd rename this to something like "IndexesBuffer" or similar, just because I got confused between Indexes and Sec.Indexes on the line below.

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:203
+// Large indexes force us to clarify exactly what this function should do. This
+// function should return the proper value of st_shndx.
 uint16_t Symbol::getShndx() const {
----------------
I'm not sure "proper value of st_shndx" is quite clear enough. Maybe "the value that will appear in st_shndx" might be better?

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:705

-  for (const auto &Sym : unwrapOrError(ElfFile.symbols(&Shdr))) {
+  auto Symbols = unwrapOrError(ElfFile.symbols(&Shdr));
+  for (const auto &Sym : Symbols) {
----------------
What is the type of Symbols here? I think the LLVM coding style is not to use auto in these situations.

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:706
+  auto Symbols = unwrapOrError(ElfFile.symbols(&Shdr));
+  for (const auto &Sym : Symbols) {
     SectionBase *DefSection = nullptr;
----------------
Ditto.

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:723
+      }
+      auto Index = ShndxData[&Sym - Symbols.begin()];
+      DefSection = Obj.sections().getSection(
----------------
Ditto.

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:726
+          Index,
+          "Symbol '" + Name + "' is defined in invalid section with index " +
+              Twine(Index));
----------------
Hmm... not sure about this error message, since the section isn't invalid, rather the section //index// is. So, maybe something like "Symbol 'bar' has an invalid section index 12345".

I assume that testing this would be unrealistic due to the size of the required test input?

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:857

+  // If a section index table exists we'll need to initilize it before we
+  // initlize the symbol table because the symbol table might need to
----------------
initilize -> initialize (x 2)

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:859
+  // initlize the symbol table because the symbol table might need to
+  // refernece it.
+  if (Obj.SectionIndexTable)
----------------
refernece -> reference

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:1021
+  // See writeEhdr for why we do this.
+  auto Shnum = size(Obj.sections()) + 1;
+  if (Shnum >= SHN_LORESERVE)
----------------
See comments about auto above.

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:1233-1238
+    for (auto &Sec : LargeIndexSections) {
+      Sec.Index = Index++;
+      if (Sec.HasSymbol) {
+        NeedsLargeIndexes = true;
+        break;
+      }
----------------
Could you use std::any_of here?
Also, I might have missed something, but why the update to Sec.Index for large-index sections only here? You assign the indexes further down anyway.

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:1245
+  if (NeedsLargeIndexes) {
+    // This means we definitely need to have a section index table but if
+    // already have one then we should use it instead of making a new one.
----------------
but if -> but if we

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:1271
+
+  // Before we can prepareForLayout the indexes need to be finalized.
+  uint64_t Index = 0;
----------------
I wouldn't use the function name in the comment here. I'd just say "prepare for layout"

================
Comment at: llvm/tools/llvm-objcopy/Object.cpp:1277
+  // The symbol table does not update all other sections on update. For
+  // instance symbol names are not added as new symbols are added. This means
+  // that some sections, like .strtab, don't yet have their final size.
----------------
"For instance" -> "For instance,"

================
Comment at: llvm/tools/llvm-objcopy/Object.h:419
+    Name = ".symtab_shndx";
+    OriginalOffset = std::numeric_limits<uint64_t>::max();
+    Align = 4;
----------------
I wonder if this should be the default for all SectionBase classes (i.e. initialise the field in the header)?

Repository:
  rL LLVM

https://reviews.llvm.org/D49206