[PATCH] D42516: [llvm-objcopy] Add support for large indexes

James Henderson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 25 02:46:29 PST 2018


jhenderson added a comment.

Ah, the joys of Prop74/Large ELFs.

In https://reviews.llvm.org/D42516#987361, @jakehehrlich wrote:

> As is there are a few problems but I didn't want to miss another day of review just because I didn't finish this today
>
> 1. There are no tests. This needs several tests. Recommendations on how to test this would be appreciated. I've been locally testing using an ELF I created that has 64k symbols and 128k sections but llvm-objcopy takes 5 seconds on that. I can work on optimizing that binary (smaller file, just the right amount of sections, etc...) but that still seems extreme. Additionally I think testing some combination of removal and such would be nice.


I don't think 5 seconds is all that unreasonable, due to the size of the ELF, but it may be worth optimizing anyway. Maybe you could have a little python script that generates a YAML file for yaml2obj (assuming yaml2obj can handle very large ELFs).

> 2. What if an --add-section pushes the section count over to SHN_LORESERVE? should we synthesize a SectionIndexSection?
> 3. What if stripping sections pushes the section counter below SHN_LORESERVE? Should we remove the SectionIndexSection because it isn't needed?

I think we might want to rebuild this section every time, prior to finalizing the ELF. Effectively, the read-in version uses this section only to interpret section indexes, and then a new one is created from scratch, if needed, afterwards. We could refuse to create it, but I wouldn't recommend that, personally. The gABI definitely suggests that this process should only be followed for larger ELFs, and not for those that don't need it.

> 4. I wasn't sure if section indexes of symbols below SHN_LORESERVE should have SHN_XINDEX as their index despite it being possible to directly encode them. Should every symbol have SHN_XINDEX for st_shndx?

I think SHN_XINDEX should only be used where the value can't be represented (i.e. >= SHN_LORESERVE). The ELF gABI says:

> "If this member contains SHN_XINDEX, then the actual section header index is too large to fit in this field."

Just to check, which document are you using for the quotes? It would be good to include the URLs to the parts of the documents.

One general point: theoretically, each symbol table could have its own SYMTAB_SHNDX section. The obvious case for this would be the static symbol table and the dynamic symbol table. The corresponding section is indicated by the sh_link field of the SYMTAB_SHNDX section, so we should be using that to establish a link. I think that implies that the SYMTAB_SHNDX section pointer should be a member of the symbol table class, not of the Object.



================
Comment at: tools/llvm-objcopy/Object.cpp:73
+void BinarySectionWriter::visit(const SectionIndexSection &Sec) {
+  error("Cannot write section index table '" + Sec.Name + "' ");
+}
----------------
I'd call this the "symbol section index table" for a bit more clarity.


================
Comment at: tools/llvm-objcopy/Object.cpp:159
+// Large indexes force us to clarify exactly what this function should do. This
+// function should return the proper value of st_shndx
 uint16_t Symbol::getShndx() const {
----------------
Full stop.


================
Comment at: tools/llvm-objcopy/Object.cpp:267-269
+        SectionIndexes->addIndex(Sym->DefinedIn->Index);
+      else
+        SectionIndexes->addIndex(SHN_UNDEF);
----------------
I think this should have a value SHN_UNDEF unless the index is >= SHN_LORESERVE.


================
Comment at: tools/llvm-objcopy/Object.cpp:546-553
+      if (Obj.SectionIndexes == nullptr)
+        error("Symbol '" + Name +
+              "' has index SHN_XINDEX but no SHT_SYMTAB_SHNDX section exists.");
+
+      const Elf_Shdr &ShndxSec =
+          *unwrapOrError(ElfFile.getSection(Obj.SectionIndexes->Index));
+      ArrayRef<Elf_Word> ShndxData = unwrapOrError(
----------------
It seems to me like this block should only happen the first time we encounter a SHN_XINDEX symbol. After that, we should already have the data, and should be able to load from it directly.


================
Comment at: tools/llvm-objcopy/Object.cpp:554
+          ElfFile.template getSectionContentsAsArray<Elf_Word>(&ShndxSec));
+      auto Index = ShndxData[&Sym - Symbols.begin()];
+      DefSection = Obj.sections().getSection(
----------------
Somewhere, we should check that the symtab shndx section has the same number of entries as the corresponding symbol table.


================
Comment at: tools/llvm-objcopy/Object.cpp:656
+  case SHT_SYMTAB_SHNDX: {
+    auto &Shndx = Obj.addSection<SectionIndexSection>();
+    Obj.SectionIndexes = &Shndx;
----------------
I'd avoid calling references to the symtab shndx section "Shndx" as that's a common abbreviation for section header index, which could lead to confusion or ambiguity in places. I'd recommend calling it ShndxSection or similar.


================
Comment at: tools/llvm-objcopy/Object.cpp:811
+    // """
+    // If the number of sections is greater than or equal to
+    // SHN_LORESERVE (0xff00), this member has the value zero and the actual
----------------
Comment looks like it's wrapped early?


================
Comment at: tools/llvm-objcopy/Object.cpp:823
+    // If the section name string table section index is greater than or equal
+    // to
+    // SHN_LORESERVE (0xff00), this member has the value SHN_XINDEX (0xffff) and
----------------
Fix comment wrapping here too.


================
Comment at: tools/llvm-objcopy/Object.cpp:854-859
+  // """
+  // If the number of sections is greater than or equal to
+  // SHN_LORESERVE (0xff00), this member has the value zero and the actual
+  // number of section header table entries is contained in the sh_size field
+  // of the section header at index 0.
+  // """
----------------
I feel like having these quotes duplicated is not great. I wonder whether it's a sign that the null shdr should be constructed in the writeEhdr function, so that the decisions only need making the once? Thoughts?


================
Comment at: tools/llvm-objcopy/Object.h:373
   StringTableSection *SymbolNames = nullptr;
+  SectionIndexSection *SectionIndexes = nullptr;
 
----------------
I'd rename this member to SectionIndexTable (or ShndxTable)


================
Comment at: tools/llvm-objcopy/Object.h:378
 public:
+  void setShndx(SectionIndexSection *Shndx) { SectionIndexes = Shndx; }
   void setStrTab(StringTableSection *StrTab) { SymbolNames = StrTab; }
----------------
I'd rename this function "setShndxTable" since setShndx implies you're setting the section header index of the section.


Repository:
  rL LLVM

https://reviews.llvm.org/D42516





More information about the llvm-commits mailing list