[PATCH] D55881: [llvm-objcopy] [COFF] Add support for removing symbols and sections

Martin Storsjö via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 20 02:01:49 PST 2018


mstorsjo marked 12 inline comments as done.
mstorsjo added a comment.

Thanks for the thorough review!

In D55881#1336855 <https://reviews.llvm.org/D55881#1336855>, @jakehehrlich wrote:

> One broad comment is that we should probably implement these in different patches. Just pick one for now and we can build up to the others.


I found them rather entangled, as removing sections also requires removing the corresponding symbols for those sections. But I can try to make this only removal of symbols, as an absolutely minimal first step.

Also, most of the comments seem to be related to the design of the data structure (or lack thereof). I'll try to restructure it according to the general direction I got from these comments, towards the goal of maintaining a correct state along the way as far as possible, moving more of the reassembly of correct state into the writer.



================
Comment at: tools/llvm-objcopy/COFF/COFFObjcopy.cpp:56
+
+  if (Config.StripAllGNU || Config.StripDebug || Config.StripAll ||
+      Config.DiscardAll || Config.StripUnneeded) {
----------------
jakehehrlich wrote:
> Do these really all do the same thing?
As far as I've seen, yes, with respect to sections. With respect to symbols, they do slightly different things though. (In the original objcopy, with things abstracted via libbfd, they probably have slightly different logic, but when mapped down to COFF, this is what I've discovered so far at least.)


================
Comment at: tools/llvm-objcopy/COFF/COFFObjcopy.cpp:136
+  // Actually do removals of symbols.
+  Obj.removeSymbols([&](const Symbol &Sym) {
+    if (!Config.SymbolsToKeep.empty() &&
----------------
jakehehrlich wrote:
> The two styles by which removeSymbols and removeSections were written was a historical accident in ELF. We should pick one and stick to it here. I was going to say I don't really care which but I'll retract that statement say that using the style that was used in `removeSymbols` is to be preferred. I think it will both perform better and be less confusing to incoming devs.
Ok, I'll try to rewrite that part (but if I get the patch split, that's in a later patch then).


================
Comment at: tools/llvm-objcopy/COFF/COFFObjcopy.cpp:144
+
+    if (!Config.SymbolsToRemove.empty() &&
+        is_contained(Config.SymbolsToRemove, Sym.Name)) {
----------------
jakehehrlich wrote:
> I think supporters of checking `empty()` prior to `is_contianed` changed their minds. This should be removed in all checks.
Ok, will do gladly.


================
Comment at: tools/llvm-objcopy/COFF/Object.cpp:35
+  Sections.erase(
+      std::remove_if(std::begin(Sections), std::end(Sections),
+                     [ToRemove](const Section &Sec) { return ToRemove(Sec); }),
----------------
jakehehrlich wrote:
> Can sections reference each other? Can a symbol reference a section? A significant portion of the complexity in removeSection in ELF comes from the fact that we handle those sorts of cases. If you remove a section then depending on circumstance we either remove the referencing object as well or throw an error because you weren't *also* removing the reference.
A symbol references a section, and a relocation can reference a symbol. There are cases where sections can reference each other (associative comdat symbols) that I haven't implemented yet - I don't think GNU objcopy does either (it doesn't know about the associative comdat concept at all, afaik). Getting those right would of course be ideal, but they're a bit out of scope of what I'm aiming at initially (basic stripping functionality).

Most of the functionality in this patch also revolves around this - removing symbols that belong to a removed section, etc, but it's rather spread out, and the Object class API isn't easy to use for constructing things from scratch, and updates requires coordinating a few steps.


================
Comment at: tools/llvm-objcopy/COFF/Object.cpp:40
+
+void Object::initRawSymbolTable() {
+  for (Symbol &S : Symbols) {
----------------
jakehehrlich wrote:
> This seems like it is perhaps something the Writer should do? I'm not clear on the intention here.
It initializes helper tables that currently only are used during the transformation stage. But I think the general remedy to most of the suggested changes is to make more of these permanent members of the data structures, updated on the go as far as possible, and moving more of the logic into the reader/writer.


================
Comment at: tools/llvm-objcopy/COFF/Object.cpp:48
+
+void Object::setReferencedSymbols(StringRef Filename) {
+  // This function is meant to be used before pruning any symbols.
----------------
jakehehrlich wrote:
> Again don't follow the error handling that I used in ELF code. You don't need to pass the otherwise useless Filename if you return an error because you can add that information when you report the error in whatever handles this. Also sometimes it won't even be a file (see -I binary for instance)
Ok, that makes sense. I didn't copy this design from the ELF code actually, I independently made the same bad design...


================
Comment at: tools/llvm-objcopy/COFF/Object.cpp:100
+    assert(Sec.Relocs.size() == Sec.RelocTargets.size());
+    for (size_t I = 0, E = Sec.Relocs.size(); I < E; I++) {
+      auto It = SymMap.find(Sec.RelocTargets[I]);
----------------
jakehehrlich wrote:
> Why can't this be a loop over Sec.Relocs? The only blocker seems to be the use of `Sec.RelocTargets[I]` but it seems that's just a parallel array which is force you to write it like this.
Yes, moving the target into the Relocs vector would get rid of that.


================
Comment at: tools/llvm-objcopy/COFF/Object.h:31
+
+  // Fields used by some transformations in objcopy, but not necessary
+  // for actually writing back to a file.
----------------
jakehehrlich wrote:
> Depending on what you mean here, this makes me think this should not be necessary. Can you elaborate?
I mean that they're not essential for reading/writing, but necessary as intermediate helpers when remapping symbol indices. So strictly, this could also be kept in separate datastructure within COFFObjcopy, as it doesn't have to be persisted to the writing stage. But it's more convenient to have them go along with the Section struct here.


================
Comment at: tools/llvm-objcopy/COFF/Object.h:33
+  // for actually writing back to a file.
+  std::vector<StringRef> RelocTargets;
+  size_t Idx;
----------------
jakehehrlich wrote:
> Aside from weather these should actually exist, (Idx probably should from my experience but I'm skeptical about RelocTargets) aren't you maintaining a parallel array array here? Why not just have a proper Reloc struct and put both the coff_relocation and the Target in it? That would improve some code I'm going to complain about below.
Sure, that should be doable.


================
Comment at: tools/llvm-objcopy/COFF/Object.h:44
+  // for actually writing back to a file.
+  bool Referenced = false;
 };
----------------
jakehehrlich wrote:
> Yeah this was the sort of thing we eventually compromised on but we agreed that it was a result of poor structure in the ELF code. Ideally this sort of thing should not be in Symbol; we just didn't find a way around it that didn't require significant upheaval in the code. Since we're in an earlier stage with COFF code we can find a better way that storing this with the Symbol itself perhaps?
I don't instinctively think of a better way of storing it - can you elaborate on what you would have in mind?


================
Comment at: tools/llvm-objcopy/COFF/Object.h:65
+  // for actually writing back to a file.
+  std::vector<Symbol *> RawSymbolTable;
+
----------------
jakehehrlich wrote:
> Seems like a field that only the writer should ever know about.
No, the write (as it currently stands) doesn't need to know about this. Currently, before invoking the writer, all relocations have correct symbol table indices set up.

A COFF symbol can have a number of auxillary symbol table entries following, with metadata about that preceding symbol. The Symbols array here only consists of logical symbols (so that when removing one symbol, the corresponding extra entries are gone along with it), and this allows looking up what Symbol a raw symbol table index (from a relocation) refers to. The writer currently just writes out the symbol table index in the relocation as is.


================
Comment at: tools/llvm-objcopy/COFF/Object.h:71
+  // StringRef taken as parameter for error reporting.
+  void initRawSymbolTable();
+  void setReferencedSymbols(StringRef Filename);
----------------
jakehehrlich wrote:
> Can you elaborate on what the intended semantics of this is? I'd kind of prefer the interface of Object be that you can construct a blank object, build it up step by step having a valid object along the way as you go, and then write it when possible. The exception to this is when parts of an Object reference each other you need to allow for temporary invalidity of some form or another while the cycle is being closed.
That sounds like a great goal for the Object class API, my design so far is too much of a plain container with the logic scattered outside of it.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55881/new/

https://reviews.llvm.org/D55881





More information about the llvm-commits mailing list