[PATCH] D46896: [llvm-objcopy] Add --strip-unneeded option

Jake Ehrlich via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 16 13:33:08 PDT 2018


jakehehrlich added a comment.

In https://reviews.llvm.org/D46896#1101378, @paulsemel wrote:

> In https://reviews.llvm.org/D46896#1101273, @jhenderson wrote:
>
> > I'm now getting a bit lost. Please could you outline the GNU objcopy behaviour for the following axes:
>
>
> Sure, I'll try to be as much consistent as I can.
>
> > Axis 1: elf type (ET_REL versus ET_DYN/ET_EXEC)
> >  Axis 2: symbol binding (Global/Weak/Local)
>
> There is a real thing here. The fact is that, as you already know, for ET_REL, global symbols might be link with other relocatable files. For this reason, and only for ET_REL, `objcopy` is keeping those symbols.
>  I will try to give you an example. Consider the test file test/tools/llvm-objcopy/localize.test. If you `yaml2obj` it, and launch `objcopy --strip-unneeded %t %t1`, you will notice that those the weak and global symbols are kept in the result binary, despite the fact there are not referenced in a relocation.
>  Now, if you take this same test file but change from ET_REL to ET_DYN (and do the same procedure again), you will notice that the symbols are not present anymore (symtab should be removed actually).
>  This is what I'm trying to reproduce with this behavior. The fact is that, if you take a look at `objcopy` code, you won't see anything about this ET_REL handling.
>  But the fact is that they are also handling this option in the section removal part of `objcopy`, which makes them removing symbol table when this option is set.
>
> I didn't do it this way because I truly think that the symtab removal might be done in the writing part of `llvm-objcopy`. By this I mean that we might test whether the symtab is empty, and if so, just remove it. The second thing is that I wanted to avoid the problem mentioned by @alexshap  in an other review with the `keep-symbol` option. This way of doing, we are avoiding the problem.
>
> To summarize: For ELF other than ET_REL, `objcopy` is basically stripping everything (except if in a reloc). For ET_REL, it tries not to leave the binary in a broken state. Indeed, if we also remove Global/Weak symbols, we are just breaking the linking part (that's not what we want).
>
> > Axis 3: undefined/defined symbols
>
> For other than ET_REL, `objcopy` doesn't seem to care about undefined/defined symbols. BUT, as @jakehehrlich mentioned, undefined Global/Weak symbols (again, that are not present in reloc) are stripped from the binary. This is, as far as I can tell, the only time `objcopy` strips G/W symbols in ET_REL Elf files.
>
> > This should help us evaluate the behaviour and determine if the approach taken is correct. It should be straightforward to generate these different cases using yaml2obj, like you do in the tests.
>
> Hope this clarifies, please tell me if you want me to elaborate on some other points :)


Eh I think this is kind of obscure behavior for GNU objcopy. I don't think it matters what we choose to do with symbols for linked binaries. For ET_REL we definitely need to preserve any symbol referenced by a relocation and we definitely need to preserve defined global/weak symbols. All other symbols should be removed. I'm fine being more aggressive for none ET_REL but I also don't care if we aren't. It's fine if you match the behavior for ET_EXEC and ET_DYN and its fine if you don't. My preference is for simpler code with fewer edge cases.



================
Comment at: tools/llvm-objcopy/Object.cpp:653-659
     Relocation ToAdd;
     ToAdd.Offset = Rel.r_offset;
     getAddend(ToAdd.Addend, Rel);
     ToAdd.Type = Rel.getType(false);
     ToAdd.RelocSymbol = SymbolTable->getSymbolByIndex(Rel.getSymbol(false));
+    SymbolTable->setSymbolInRelocByIndex(Rel.getSymbol(false));
     Relocs->addRelocation(ToAdd);
----------------
paulsemel wrote:
> jakehehrlich wrote:
> > I don't nessarilly think this should be done in this change but I think I don't like how I wrote addRelocation. I think it should take the parameters to build a relocation not request that the user build the relocation. That would  have the added benefit of allowing addRelocation to update the aforementioned count.
> Well, maybe it can be a bit confusing to do this "fix" in this change.. I can add a TODO if you prefere ?
That's fine.


================
Comment at: tools/llvm-objcopy/Object.h:347
   uint8_t Visibility;
+  bool InReloc;
 
----------------
paulsemel wrote:
> jakehehrlich wrote:
> > I'm not sure this is sufficient, or if it is it has to be set at the right time. For example say a symbol is referenced by a relocation in .rel.data and then .rel.data is removed. It could be that that symbol is either still referenced by another relocation in another section or it could not be. No local operation would tell you the answer either.
> > 
> > I think there are two possibilities.
> > 1) Maintain a count that can be incremented/decremented as relocations are added/removed
> > 2) Do a pass at the end to determine which symbols should be kept/not kept. You can either set this bool in that pass or add symbols to a set like container. see: http://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc
> Arf.. I totally forgot the "what if the reloc section is removed" part Thanks about it !
> I think I will go for the counter, as I first wanted to be as light as possible for this option (and I don't see anything that might cause problems with the counter).
I was thinking about this and I was wondering if this isn't something that should be more general. Like it there are other ways symbols might be needed other than relocations. This should be a generic reference count. But we already have shared_ptr and weak_ptr to tell us how this should work! 

Can we think of a way to reuse that infrastructure? For instance those types already have use_count methods so that can be used to determine weather a symbol is needed or not. Alternatively it would make sense to create a similar sort of abstraction that managed the reference counting for us.


================
Comment at: tools/llvm-objcopy/llvm-objcopy.cpp:392
 
+      if (Config.StripUnneeded && !Sym.InReloc && Sym.Index != 0 &&
+          ((Obj.Type == ET_REL && Sym.Binding == STB_LOCAL &&
----------------
paulsemel wrote:
> jakehehrlich wrote:
> > The whole Sym.Index != 0 smells to me and is a product of me not thinking about the dummy symbol. At the very least that should never be past to a function in removeSymbols or updateSymbols. It might be better to handle the dummy symbol the way we handle the dummy section, namely not store it and account for the increased index. Probably best for another change though. Could you add a TODO here for that?
> I totally agree with you ! I Actually wanted to talk with you about this dummy symbol, because there is some problems, especially with sections where the only remaining symbol is the dummy one (we might want to remove the section).
> I will add a TODO, no problem ! :)
We probably don't want to remove empty symbol tables IMO. I don't think it has any real tendency to come up for any practical purpose where the desire wouldn't just be to remove the section directly.


Repository:
  rL LLVM

https://reviews.llvm.org/D46896





More information about the llvm-commits mailing list