[PATCH] D69093: [llvm-objcopy] --add-symbol: fix crash if SHT_SYMTAB does not exist

James Henderson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 28 08:18:27 PDT 2019


jhenderson added inline comments.


================
Comment at: llvm/test/tools/llvm-objcopy/ELF/add-symbol-no-symtab.test:28
+## Don't create .symtab if --add-symbol is not specified.
+# RUN: llvm-objcopy --remove-section=.symtab --remove-section=.strtab %t %t1
+# RUN: llvm-objcopy %t1 %t2
----------------
You don't necessarily need to update your test here, but in future, could you avoid reusing %t, %t1 etc for different cases if they get modified (e.g. %t is fine to use in this line, but the output should be %t3 or whatever to avoid clashing with the previous case).  It makes debugging somewhat easier if there are problems with the test.


================
Comment at: llvm/test/tools/llvm-objcopy/ELF/add-symbol-no-symtab.test:32
+
+## Check that we create both .strtab and .symtab.
+## We reuse the existing .shstrtab (section names) for symbol names.
----------------
This comment seems stale? We are reusing .shstrtab, and nothing shows we create .strtab here. Should the test also show that .strtab has NOT been created?


================
Comment at: llvm/tools/llvm-objcopy/ELF/Object.cpp:1544
+  } else if (EnsureSymtab) {
+    // Reuse the existing SHT_STRTAB section if exists.
+    StringTableSection *StrTab = nullptr;
----------------
the existing -> an existing (since there can be multiple)
if exists -> if it exists


================
Comment at: llvm/tools/llvm-objcopy/ELF/Object.cpp:1550-1552
+        // Prefer .strtab to .shstrtab.
+        if (Obj.SectionNames != &Sec)
+          break;
----------------
MaskRay wrote:
> rupprecht wrote:
> > rupprecht wrote:
> > > MaskRay wrote:
> > > > grimar wrote:
> > > > > grimar wrote:
> > > > > > MaskRay wrote:
> > > > > > > abrachet wrote:
> > > > > > > > rupprecht wrote:
> > > > > > > > > MaskRay wrote:
> > > > > > > > > > rupprecht wrote:
> > > > > > > > > > > Should `.dynstr` also be avoided? Is it better to just check that the name is literally `.strtab`?
> > > > > > > > > > .dynstr is avoided by `!(Sec.Flags & SHF_ALLOC)`. `.strtab` works as well but `Obj.SectionNames != &Sec` is more general. There are multiple ways to accomplish the samething. I just wanted to make the code less magical.
> > > > > > > > > From: http://www.sco.com/developers/gabi/latest/ch4.sheader.html#special_sections
> > > > > > > > > > .strtab
> > > > > > > > > > This section holds strings, most commonly the strings that represent the names associated with symbol table entries. If the file has a loadable segment that includes the symbol string table, the section's attributes will include the SHF_ALLOC bit; otherwise, that bit will be off.
> > > > > > > > > 
> > > > > > > > > So, `.strtab` can also be `SHF_ALLOC`.
> > > > > > > > For reference, we used to have a file in test/tools/llvm-objcopy/ELF/Inputs/ called alloc-symtab which had an SHF_ALLOC .strtab. It was removed in D65278. 
> > > > > > > To confirm, I think `... && !(Sec.Flags & SHF_ALLOC)` is correct here.
> > > > > > > For reference, we used to have a file in test/tools/llvm-objcopy/ELF/Inputs/ called alloc-symtab which had an SHF_ALLOC .strtab. It was removed in D65278.
> > > > > > 
> > > > > > Thanks for the reference. I completely forgot that saw an allocatable `.symtab` that time.
> > > > > > 
> > > > > > To clarify: in D65278 I've replaced a precompiled binary with a YAML description (with a SHF_ALLOC `.symtab`).
> > > > > > The logic of tests shouldn't have been affected, because tests wanted to see the SHF_ALLOC symbol table (but not SHF_ALLOC string table)
> > > > > > I believe.
> > > > > > 
> > > > > > If we need a test with a SHF_ALLOC `.strtab`, it should be possible to craft one with a yaml2obj too I think.
> > > > > > (I haven't check, but I think all that needed is to define a string table section and a `Flag` property which should override the default).
> > > > > > 
> > > > > > So, .strtab can also be SHF_ALLOC.
> > > > > 
> > > > > > To confirm, I think ... && !(Sec.Flags & SHF_ALLOC) is correct here.
> > > > > 
> > > > > To summarize and clarify this a bit:
> > > > > This part of code is called when there is no symbol table (`.symtab`).
> > > > > Jordan mentioned that `.strtab` can be allocatable in according to specification.
> > > > > 
> > > > > So if I understand correctly, the question is: is it possible that
> > > > > we have no symbol table, but have an allocatable `.strtab` in the object?
> > > > > 
> > > > > If it is, this code might either use the `.shstrtab` instead of existent `.strtab`,
> > > > > or perform an attemp to create a new string table (if there is no `.shstrtab` to reuse) it seems.
> > > > > Then it will create a `.symtab` using the string table either found or created instead of the existent allocatable '.strtab'.
> > > > > Does not sound correct perhaps?
> > > > > 
> > > > Both SHF_ALLOC .strtab and non-SHF_ALLOC .strtab are possible, though I don't think ld or objcopy could create SHF_ALLOC .strtab.
> > > > 
> > > > Generally objcopy should not alter SHF_ALLOC sections. The code skips both .dynstr (SHF_ALLOC) and SHF_ALLOC .strtab.  SHF_ALLOC .strtab is not tested, though.
> > > Yes, exactly. Repro:
> > > 
> > > ```
> > > $ cat strtab.yaml
> > > --- !ELF
> > > FileHeader:
> > >   Class:   ELFCLASS64
> > >   Data:    ELFDATA2LSB
> > >   Type:    ET_REL
> > >   Machine: EM_X86_64
> > > Sections:
> > >   - Name: .strtab
> > >     Type: SHT_STRTAB
> > >     Flags: [ SHF_ALLOC ]
> > > $ yaml2obj strtab.yaml > strtab.o
> > > $ llvm-objcopy --add-symbol=abs1=1 strtab.o strtab2.o
> > > $ llvm-readobj --sections strtab2.o
> > > ...
> > >   Section {
> > >     Index: 1
> > >     Name: .strtab (11)
> > >     Type: SHT_STRTAB (0x3)
> > >     Flags [ (0x2)
> > >       SHF_ALLOC (0x2)
> > >     ]
> > >   }
> > >   Section {
> > >     Index: 2
> > >     Name: .shstrtab (1)
> > >     Type: SHT_STRTAB (0x3)
> > >     Flags [ (0x0)
> > >     ]
> > >   }
> > >   Section {
> > >     Index: 3
> > >     Name: .symtab (19)
> > >     Type: SHT_SYMTAB (0x2)
> > >     Flags [ (0x0)
> > >     ]
> > >     Link: 2  # <-- .shstrtab, not .strtab
> > >   }
> > > ...
> > > ```
> > > 
> > > Running the same example through a recent GNU objcopy, it looks like it creates a second `.strtab` without the `SHF_ALLOC` bit and links to that:
> > > 
> > > ```
> > > $ llvm-objcopy --add-symbol=abs1=1 strtab.o strtab2.o
> > > $ llvm-readobj --sections strtab2.o
> > >   Section {
> > >     Index: 1
> > >     Name: .strtab (9)
> > >     Type: SHT_STRTAB (0x3)
> > >     Flags [ (0x2)
> > >       SHF_ALLOC (0x2)
> > >     ]
> > >   }
> > >   Section {
> > >     Index: 2
> > >     Name: .symtab (1)
> > >     Type: SHT_SYMTAB (0x2)
> > >     Flags [ (0x0)
> > >     ]
> > >     Link: 3 # <-- non-SHF_ALLOC .strtab below, not the SHF_ALLOC .strtab above
> > >   }
> > >   Section {
> > >     Index: 3
> > >     Name: .strtab (9)
> > >     Type: SHT_STRTAB (0x3)
> > >     Flags [ (0x0)
> > >     ]
> > >   }
> > >   Section {
> > >     Index: 4
> > >     Name: .shstrtab (17)
> > >     Type: SHT_STRTAB (0x3)
> > >     Flags [ (0x0)
> > >     ]
> > >   }
> > > ```
> > > 
> > > This behavior seems reasonable to me vs re-using the .shstrtab for strings that aren't section headers.
> > > 
> > > > To confirm, I think ... && !(Sec.Flags & SHF_ALLOC) is correct here.
> > > > 
> > > BTW, I don't find it particularly constructive to simply state a position without any rationale.
> > > Running the same example through a recent GNU objcopy, it looks like it creates a second .strtab without the SHF_ALLOC bit and links to that:
> > > 
> > > $ llvm-objcopy --add-symbol=abs1=1 strtab.o strtab2.o
> > > $ llvm-readobj --sections strtab2.o
> > > 
> > This second example is actually what happens w/ GNU objcopy, not llvm-objcopy.
> > 
> Reusing `.shstrtab` is what `SEC3` tests. It may look a bit strange. `e_shstrndx` may refer to `.strtab` (clang reuses the section for both section names and symbol names). If we want to behave like GNU objcopy, we will have to special case the string `.strtab`. In any case, I don't think the (1) .symtab does not exist (2) --add-symbol= causes .symtab to be created   edge case needs more treatment for GNU compatibility.
In an ideal world, the names of ELF sections should be largely irrelevant. I think rather than explicitly prefer ".strtab" over ".shstrtab", it's more correct to say we prefer a string table that is not the section header string table, if such a table exists.

I'm quite alright with a new string table to be created if needed for symbol names, if the existing one is SHF_ALLOC. I don't have a preference for the name (.strtab seems fine, even if it already exists). Is there anything stopping the section header string table having the SHF_ALLOC bit set however? If not, we need a separate test for this case, I believe.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69093/new/

https://reviews.llvm.org/D69093





More information about the llvm-commits mailing list