[PATCH] D13501: [ELF2] Add -wrap switch

Wed Oct 7 13:12:59 PDT 2015

On Wed, Oct 7, 2015 at 10:28 AM, Igor Kudrin <ikudrin.dev at gmail.com> wrote:

> They have quite a good explanation of the option's behavior and both ld
> and gold produce the same result. The option should only change the
> resolution target of the specified symbols and shouldn't do anything else;
> otherwise, the generated result will be unpredictable. In particular, we
> should neither change the name of a defined symbol nor remove it.
>
> OK, I can suggest a bit different approach. What if we add a field to the
> Symbol class to redirect undefined symbol bodies to another Symbol? It'll
> require a small update in resolve() and maybe in a few other places, but
> the additional hash map will be gone. What do you think?
>

We may be overthinking about that. I don't think that the difference
between LLD and GNU ld is going to be a real issue in real-world usage, and
even it does, we can do whatever to fix that after we recognize the need.
So I think something like http://reviews.llvm.org/D13528 suffices.

>
> On 07.10.2015 22:38, Rui Ueyama wrote:
>
> We are not necessarily aim 100% precise compatibility with GNU ld in every
> detail. GNU ld may have implemented this feature in a way that's natural to
> them. I'd really want to implement this feature in a way as the LLD
> architecture is designed for. This --wrap option is by nature a bit hacky
> option, and users need to understand what they are doing. GNU ld manual
> does not mention about the details about how this option is supposed to
> work.
>
> On Wed, Oct 7, 2015 at 9:12 AM, Igor Kudrin <ikudrin.dev at gmail.com> wrote:
>
>>
>>
>> On 07.10.2015 20:22, Rui Ueyama wrote:
>>
>> On Wed, Oct 7, 2015 at 6:59 AM, Igor Kudrin < <ikudrin.dev at gmail.com>
>> ikudrin.dev at gmail.com> wrote:
>>
>>> ikudrin added a comment.
>>>
>>> In http://reviews.llvm.org/D13501#261600, @ruiu wrote:
>>>
>>> > This patch needs redesigning because we don't want to look up hash
>>> tables more than once for each symbol. In this patch, names for undefined
>>> symbols are looked up twice -- once in the InputFile.cpp and the other in
>>> SymbolTable.cpp.
>>>
>>>
>>> We have to look up for different names for defined and undefined
>>> symbols, so we can't use just one hash map.
>>>
>>> In most cases, when the -wrap switch is not used and
>>> UndefSymNameReplacement is empty, addition lookup will be very cheap,
>>> without calculating a hash value at all. On the other hand, we can use just
>>> std::map which expected to work really quick in our case.
>>>
>>>
>> No, we don't have to look up hash table more than once. I thought a bit
>> more about this and noticed that the LLD architecture (the separation of
>> Symbol and SymbolBody) would really play well here. Symbols are just
>> pointers to SymbolBodies, and symbol renaming is as easy as single pointer
>> mutation. Here's the idea.
>>
>> First we resolve all symbols without considering --wrap option (so no
>> overhead for --wrap during file parsing and symbol resolution). When symbol
>> resolution is done, we basically have three Symbols for --wrap'ed symbol S
>> in the symbol table: S, __wrap_S, and __real_S. Let X, Y and Z be their
>> SymbolBodies, respectively.
>>
>> Originally the relationships are
>>
>>   S -> X
>>   __wrap_S -> Y
>>   __real_S -> Z
>>
>> We swap the pointers so that their relationships are
>>
>>   S -> Y
>>   __wrap_S -> Y
>>   __real_S -> X
>>
>> Now you can see that all symbols that would have been resolved to S
>> without --wrap are now resolved as if they were __wrap_S. __real_S is also
>> properly resolved to the symbol that was once the real symbol pointed to.
>> This is the same result as what --wrap expects.
>>
>>
>> Thank you for explaining your idea. It can work in some cases, but I
>> dread it's unable to cover all situations.
>>
>> 1) Suppose we have all three symbols defined. The command like "ld.lld2
>> a.o -wrap=S" in your case will eliminate Z and change content for symbols S
>> and __real_S (see SymbolTableSection<ELFT>::writeGlobalSymbols). That's not
>> right, GNU's ld preserves symbols in this case.
>>
>> 2) Suppose that S and __wrap_S are defined in different object files,
>> which lay in archives. With switch "-wrap=S" (and if we have a reference to
>> S) we should not get the content of the first object file at all, but how
>> can we avoid parsing it if there is the record 'S' in the SymbolTable?
>>
>> 3) If we generate DSO and have an unresolved undefined symbol S we have
>> to rename it and store as __wrap_S. It'll require more complicated update
>> of symbols in the symbol table to support this case.
>>
>> Finally, my approach, I hope, preserves the original idea of
>> relationships in the symbol table. With my case, we will have links like
>> these:
>> (Body for undefined S, name is "__wrap_S") -> (Symbol with name
>> "__wrap_S") -> (Body for defined symbol "__wrap_S")
>> (Body for defined __wrap_S) -> (Symbol with name "__wrap_S") -> (Body for
>> defined symbol "__wrap_S")
>> (Body for defined S) -> (Symbol with name "S") -> (Body for defined
>> symbol "S")
>>
>> With your proposal we'll end up with links like:
>> (Body for undefined S, name is "S") -> (Symbol with name "S") -> (Body
>> for defined symbol "__wrap_S")
>> (Body for defined __wrap_S) -> (Symbol with name "__wrap_S") -> (Body for
>> defined symbol "__wrap_S")
>> (Body for defined S) -> (Symbol with name "S") -> (Body for defined
>> symbol "__wrap_S")
>>
>> The last line is a bit unexpected and can be a source for unwitting bugs.
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151007/c1547e4b/attachment.html>