[PATCH] D55881: [llvm-objcopy] [COFF] Add support for removing symbols

Fri Dec 21 01:48:55 PST 2018

mstorsjo marked 4 inline comments as done.
mstorsjo added a comment.

In D55881#1338420 <https://reviews.llvm.org/D55881#1338420>, @jakehehrlich wrote:

> Head's up I'm about to go home for the evening and I'm pretty well off for until the new year after that. I think others are in a similar boat. I'm not ready to LGTM this yet and I don't have time to give more review so this might have to wait until January 2nd.

Ok, thanks for the reviews so far, let's pick it up again after new year's. I realized a few shortcomings in my approach so far (that had gone unnoticed while I've kept my patchset in use for a couple weeks but that just now cropped up) - the fact that I can't use symbol names as unique keys, and that this tool really does run on objects with associative comdats in the build setups I'm testing, so I'll try to revise it further until then.

================
Comment at: test/tools/llvm-objcopy/COFF/X86/remove-local-symbols.s:1
+# RUN: llvm-mc -triple=x86_64-windows-gnu %s -filetype=obj -o %t.o
+
----------------
alexshap wrote:
> so I'm  confused - what decision has been made regarding tests (sorry, many people have commented, I'd like to understand the logic): 
> all the new tests will use assembly  while yaml is deprecated, we will use both yaml and assembly, we will use yaml and assembly only in exceptional cases, or what ?
> As far as I'm concerned - i want consistency + functionality, at least in the future. 
I'm not aware of any other decision other than preferring yaml over binary object files.

I'd strongly prefer to be able to use both yaml and assembly for these tests; yaml is ideal for dumped output from a full toolchain, including debug info and everything that a real object file would contain. Assembly is better for the cases when I want to construct a specific setup to test certain cases, where this keeps the test much smaller and easier to read and understand. And for executables instead of object files, yaml is the only option as we can't rely on lld.

================
Comment at: tools/llvm-objcopy/COFF/Object.cpp:19
+
+void Object::updateSymbols() {
+  SymbolMap.clear();
----------------
jakehehrlich wrote:
> I was thinking of updateSymbols as more of an operation the user could take on each symbol. See updateSymbols in the ELF code for instance. This seems like something that can be done at finalization time.
Some of this can be done at finalization, but the map from symbol name to object pointer needs to be up to date when we do `setReferencedSymbols` or `markSymbols` if I rename it that way. With the current setup, of only supporting removing symbols and nothing else, it's enough if the Reader sets this up as well, but if we do multiple consecutive operations on the symbol vector (once we add removing of sections) we need to do it after each one of them.

However I just realized that there's fatal flaw with this in the current form; symbol names aren't necessarily unique within a COFF object file - there's a local symbol for each section, and there can be multiple sections with the same name. In the COFF files on disk, the symbol table index is the unique key that differentiates them, but when we rearrange the symbol table (e.g. on removals) I'll probably have to add some other unique id to be able to hook them up correctly back again after removals.

================
Comment at: tools/llvm-objcopy/COFF/Object.h:44
+  // for actually writing back to a file.
+  bool Referenced = false;
 };
----------------
jakehehrlich wrote:
> mstorsjo wrote:
> > jakehehrlich wrote:
> > > Yeah this was the sort of thing we eventually compromised on but we agreed that it was a result of poor structure in the ELF code. Ideally this sort of thing should not be in Symbol; we just didn't find a way around it that didn't require significant upheaval in the code. Since we're in an earlier stage with COFF code we can find a better way that storing this with the Symbol itself perhaps?
> > I don't instinctively think of a better way of storing it - can you elaborate on what you would have in mind?
> I don't have anything specifically in mind, I don't remember the review too well. It's kind of formats specific. For instance a symbol can be "referenced" by more than one section. It might be nice to have sections list all the symbols that they reference (except the symbol table which is the owner not the referencer) but this could cause O(n^2) running times. I was hoping that some quirk or insight you might have into the nature of COFF might lead us toward a means of avoiding storing weather this was referenced or not while maintaining something about as efficient. Had I done a complete redesign of the ELF code with this in mind we might wound up with something like a graph that could be walked to take out all the things that we actually wind up using and stripping would just reduce the set of initial nodes for instance; That's a crazy different direction but maybe something less extreme is possible here.
Well the alternative, as I see it, would be to store a backreference in each symbol, indicating what relocations refer to it, and that pretty much doubles the storage needed for all relocations. It probably creates a nice walkable graph in all directions, but also feels rather overengineered.

================
Comment at: tools/llvm-objcopy/COFF/Object.h:72
+  void initRawSymbolTable();
+  void setReferencedSymbols(StringRef Filename);
+
----------------
jakehehrlich wrote:
> jakehehrlich wrote:
> > If possible its most ideal to keep this true as you go rather than setting it all at once. We found that wasn't easily possible to do correctly in the ELF code 1) That might not be true here but even if it is 2) A more generic interface to this would be to use updateSymbols which is the sort of generic operation on an Object that I'd like to see used in favor of something specific like this.
> Oh other thing, I think we did this on the section level (which probably doesn't make sense here) but it was called "markSymbols" to mimic "mark and sweep" garbage collection.  I'd prefer that name but it hardly matters.
Ok, I can rename it to markSymbols.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55881/new/

https://reviews.llvm.org/D55881