[PATCH] D55881: [llvm-objcopy] [COFF] Add support for removing symbols

Martin Storsjö via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 3 14:16:18 PST 2019


mstorsjo marked 14 inline comments as done.
mstorsjo added a comment.

In D55881#1344707 <https://reviews.llvm.org/D55881#1344707>, @jhenderson wrote:

> > There is, however, at least one case in a test (that depends on D56140 <https://reviews.llvm.org/D56140>) which is impossible to make using YAML as input at the moment; the COFF YAML format doesn't distinguish between multiple symbols with the same name for relocations. To fix that, the COFF YAML format would need to be changed to use symbol indices instead of names for relocations (mirroring what actually is stored in the files).
>
> Hmm... that's an interesting case. I think it accepting arbitrary indexes would be nice. IIRC, the ELF YAML format allows for some fields specifying either a name or index, and if an index, it doesn't do any checking, which allows for crafting invalid inputs to test error handling. Having a symmetry here would make a lot of sense.


Ok, I'll try to have a look at that and whether it's possible to extend the COFF YAML format similarly.



================
Comment at: test/tools/llvm-objcopy/COFF/X86/remove-symbol1.s:11-12
+# RELOCS-NEXT:   Section (1) .text {
+# RELOCS-PRE-NEXT:  0x4 IMAGE_REL_AMD64_REL32 ptr (16)
+# RELOCS-POST-NEXT: 0x4 IMAGE_REL_AMD64_REL32 ptr (15)
+# RELOCS-NEXT:     0xE IMAGE_REL_AMD64_REL32 foo (12)
----------------
jhenderson wrote:
> Unless there's a strong need, I'd just drop the (16)/(15) bit at the end, as it allows folding these two lines together, and doesn't really give us anything, since there's only one "ptr" symbol.
The point I tried to make in keeping it explicit, is that llvm-objcopy updated the symbol index in this relocation. Granted, if it wouldn't have, the symbol name would have changed as well (or it would have ended up as a broken relocation), but I tried to show it clearly that this reloc actually does refer to a symbol that changed index.


================
Comment at: test/tools/llvm-objcopy/COFF/X86/remove-symbol1.s:47
+
+# SYMBOLS-POST: SYMBOL TABLE:
+# SYMBOLS-POST-NEXT: .text
----------------
jhenderson wrote:
> Could you largely fold this and the PRE block together, to reduce risk of copy/paste issues, and make clear the difference?
> 
> You could do this by sharing the majority of the checks, but having a single difference for the removed symbol:
> ```
> ...
> # SYMBOLS-NEXT: foo
> # SYMBOLS-PRE-NEXT: otherfunc
> # SYMBOLS-NEXT: otherfunc2
> ...
> ```
Oh, right, yes - will do.


================
Comment at: tools/llvm-objcopy/COFF/COFFObjcopy.cpp:32
+  // If we need to do per-symbol removals, initialize the Referenced field.
+  if (Config.StripUnneeded || Config.DiscardAll ||
+      !Config.SymbolsToRemove.empty())
----------------
jhenderson wrote:
> You might want to simplify this `if` in this first version to only check the SymbolsToRemove.
Right, yes, will fix.


================
Comment at: tools/llvm-objcopy/COFF/COFFObjcopy.cpp:40-42
+      // Explicitly removing a referenced symbol is an error, while
+      // implicitly removing them as part of a blanket removal above is
+      // accepted.
----------------
jhenderson wrote:
> This comment doesn't really make sense in the patch's current form.
Oops, will fix.


================
Comment at: tools/llvm-objcopy/COFF/Object.cpp:38-39
+Error Object::markSymbols() {
+  for (Symbol &Sym : Symbols)
+    Sym.Referenced = false;
+  for (const Section &Sec : Sections) {
----------------
jhenderson wrote:
> mstorsjo wrote:
> > jhenderson wrote:
> > > Referenced is initialised to false. Do you need to reset it here as well? I don't envisage a situation where we should need to do this more than once on a single object.
> > I'd rather have the method be self-contained enough to clear it here, in case the caller would call this method multiple times. If it doesn't, the method leaks the limitation that it can only be called once, which in itself also is rather encapsulation.
> > 
> > To make things safer perhaps the field should be initialized to true, to indicate that all symbols are referenced unless we've called this and actually know for sure?
> Okay, I see your point, so it's fine what you have done. I have issues both with it being set to true and to false, because each would result in technically incorrect state for a period of time until the function is called. Perhaps it would be best to simply not initialise Referenced at all, until `markSymbols` is called, since the value can't be relied on to be safe without that call.
Ok, I can leave it uninitialized, and then sanitizers and checkers can point out if it's ever used uninitialized when it shouldn't.


================
Comment at: tools/llvm-objcopy/COFF/Object.h:27-28
+struct Relocation {
+  Relocation() {}
+  Relocation(const object::coff_relocation &R) : Reloc(R) {}
+
----------------
jhenderson wrote:
> Are these constructors really necessary? Won't they be generated by default?
Sorry, I forgot to reply on this one yesterday. They're indeed unnecessary if using the right syntax for initializing, I didn't think of the new C++11 syntax for doing that without explicit constructors.


================
Comment at: tools/llvm-objcopy/COFF/Object.h:46-47
   ArrayRef<uint8_t> AuxData;
+  size_t Key;
+  size_t RawIdx;
+  bool Referenced = false;
----------------
jhenderson wrote:
> mstorsjo wrote:
> > jhenderson wrote:
> > > Don't abbreviate unnecessarily (Idx -> Index). Do we really need both separately? If so, perhaps better names would be `OriginalIndex` and `Index` or similar, and then use the input symbol index as the key.
> > Ok, I'll expand the unnecessarily terse name.
> > 
> > As for `Key` vs `OriginalIndex` - yes, input symbol index would work just as well for key (and that's pretty much what I use right now anyway). But as @jakehehrlich wants to have the Object API usable even for constructing object files from scratch, naming it `OriginalIndex` feels a bit limiting.
> Okay, no problem. I'd marginally prefer `UniqueID` to `Key`, I think, but I'm okay if you prefer it this way around.
I can change it to `UniqueId`, that's what I originally called it, but later changed it into `Key` because I thought it was better - I'll change it back.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55881/new/

https://reviews.llvm.org/D55881





More information about the llvm-commits mailing list