[PATCH] D50589: [llvm-objcopy] Implement -G/--keep-global-symbol(s).

Jordan Rupprecht via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 14 15:04:17 PDT 2018


rupprecht added a comment.

In https://reviews.llvm.org/D50589#1198576, @jhenderson wrote:

> In https://reviews.llvm.org/D50589#1197922, @rupprecht wrote:
>
> > In https://reviews.llvm.org/D50589#1196944, @jhenderson wrote:
> >
> > > What happens if you specify a mixture of --globalize-symbol and --keep-global-symbol?
> >
> >
> > The symbol will be a global in the output file.
> >  That's how it accidentally was in the way I wrote it, but I added a test case to make sure it stays that way and left some more comments.
>
>
> I think I wasn't explicit enough with my wording to explain what I was thinking: what happens if you specify `--globalize-symbol a --keep-global-symbol b`? It looks like the code will turn a into STB_LOCAL, and b will remain as it currently is (similar points about --localize|weaken-symbol also apply). Does this match GNU's behaviour? I'd expect a to be STB_GLOBAL and b to remain unchanged.


Ah, I see. I took a closer look at GNU and was very surprised at the results.
I basically ran --globalize-symbol a --keep-global-symbol b for a/b being local, weak, and global. (I added this as a separate test case).
For GNU, `b` ended up global in the output file. But `a` was kept local unless it was originally local; when `a` was local in the input, it ended up being global in the output.

For reference, the GNU business logic is here: https://github.com/redox-os/binutils-gdb/blob/a9d73bb6dacfbbdc826fea178e3a47bf7a4154b1/binutils/objcopy.c#L1552
It seems that:

- --globalize intentionally only applies to local symbols, so we're already broken w.r.t gnu compatability. (I don't know what the history of that is in llvm-objcopy -- I can fix that if you want). So running GNU objcopy --globalize Weak1 --keep-global-symbol Weak2 will result in Weak1 being local.
- --globalize omits already global symbols (i.e. `(flags & BSF_LOCAL)`), but it only checks on the *original* binding type (i.e. it does not look at `sym->flags`). So --globalize G1 --keep-global-symbol G2 means that G1 will go from global->local (because it isn't included in --keep-global-symbol), but won't go back from local->global because the original binding was already global.

I'm not sure if the weak symbol binding is an issue; I can fix that in its own commit if you want. For the global symbol binding issue, IMHO, that's a bug, and we should do the functionality you suggested: --globalize foo should keep foo global, whether or not --keep-global-symbol covers it. OTOH, it's also an edge case that probably never comes up.



================
Comment at: tools/llvm-objcopy/ObjcopyOpts.td:91
+          "Keep only symbols in <filename> as global. <filename> contains one "
+          "symbol per line and may contain comments beginning with '#'. May be "
+          "repeated to keep symbols from many files as global.">;
----------------
jhenderson wrote:
> rupprecht wrote:
> > jhenderson wrote:
> > > What happens in GNU objcopy if you have a file containing a line like "Global1 Global2"? Does it a) keep both, b) keep neither, c) keep Global1, or d) keep a symbol called "Global1 Global2"?
> > It prints this amusing message and then keeps Global1:
> > objcopy: <path>/keep-global-symbols.test.tmp-globals2.txt:3: Ignoring rubbish found on this line.
> > 
> > See this part of the gnu file parser: https://github.com/redox-os/binutils-gdb/blob/a9d73bb6dacfbbdc826fea178e3a47bf7a4154b1/binutils/objcopy.c#L1076
> > 
> > The way I have it written, this will break compatibility -- it will trim off comments and whitespace before/after the symbol name, but not between, so it will keep a symbol called "Global1 Global2". But, I think this is a case that will never come up; anyone relying on the parsing to work like this is probably doing it wrong anyway. So, I'm inclined to leave it as written just because it's simpler. WDYT?
> I think it's reasonable for two reasons:
> 1) It is possible using asm at least to create symbol names with whitespace e.g. the following will create a global symbol called "a b":
> ```
> .global "a b"
> "a b":
> 	nop
> ```
> In fact, you can even get " a b " as a valid symbol name like that. Maybe we should just not trim whitespace, although I'm not confident about that.
> 
> We should probably test that this is kept in fact by our new behaviour, since by my understanding of what you are saying, it is impossible to specify such a name in the keep-global-symbols file.
> 
> 2. Users who have been seeing and ignoring the warning deserve what's coming to them...
I think trimming whitespace from the end makes sense, e.g. to allow things like:

SomeSymbol # We want this global because X

Anyway, I added a test case that asserts " a b " in the file will refer to symbol "a b" in the object.


Repository:
  rL LLVM

https://reviews.llvm.org/D50589





More information about the llvm-commits mailing list