[LLVMdev] [lld] Undefined symbols postprocessing

Mon Feb 23 00:26:23 PST 2015

Rui, see inline.

On 02/20/2015 10:20 PM, Rui Ueyama wrote:
On Wed, Feb 18, 2015 at 1:38 AM, Denis Protivensky <dprotivensky at accesssoftek.com<mailto:dprotivensky at accesssoftek.com>> wrote:
Hi everyone,

In lld, I need to conditionally add symbols (like GLOBAL_OFFSET_TABLE)
during
static linking because they may be used by relocations (R_ARM_TLS_IE32) or
by some other stuff like STT_GNU_IFUNC symbols.
The problem is that now symbols are added in a declarative way by
specifying in ExecutableWriter::addDefaultAtoms() override.
At that stage, there's no way to determine if additional symbols are
required.
But libraries providing optimizations like STT_GNU_IFUNC
(glibc, for example) expect the GOT symbol to be defined, so the linking
process
fails in Resolver::resolve() if the symbol is not found.

I propose to add the ability to ignore undefined symbols during initial
resolution, and then postprocess only those undefines for the second time
after the pass manager execution.

Technically, this shouldn't be a problem:
- there will be a new option in the linking context that should signal
that the postprocessing of undefined symbols should be performed.
- if postprocessing option is set, newly added symbols will be collected
in the MergedFile returned by the Resolver, and then only those new symbols
will take part in the resolution process very similar to what
Resolver::resolve() does.
- available implementations will not break and keep working without use of
postprocessing feature.

I'm fine with the basic idea of allowing undefined symbols in the first resolver pass. A few questions about the implementation.

- How do you know which atom is newly added and which is not? Once an atom is added to a MutableFile, there's no easy way to recognize that, I guess.
The Resolver returns Resolver::MergedFile type as a result of call to resolve(), and we can override its addAtom method to put newly added atoms to a special separate collection which then may be examined for undefines.

- Does the second resolver pass need to be run after all other passes? Why don't you run the resolver once, and then call some externally-given function (from the resolver) to get a list of atoms that needs to be added to the result, and then resolve again, all inside the resolver?
Since we have a chance to determine newly added atoms after resolution, I don't see why to complicate the process with external functions and additional call dependencies. It all can be done by adding second resolve()-like function call in the Driver::link() after PassManager run.

So my proposal is to move from the declarative style towards imperative
and more flexible approach. Of course, there's a downside as the code
loses some of its regularity and becomes more volatile, but in the end -
we have tests to cover such things and ensure everything works as expected.

Any ideas?

- Denis Protivensky.

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150223/9890fe03/attachment.html>