[PATCH] D13501: [ELF2] Add -wrap switch

Thu Oct 8 15:08:26 PDT 2015

On Wed, Oct 7, 2015 at 10:55 PM, Igor Kudrin <ikudrin.dev at gmail.com> wrote:

> I've added some examples to show the differences between your patch and
> GNU ld's behavior, see http://reviews.llvm.org/D13528. For me, it's not
> just the different way to implement the switch, but mostly change its
> meaning.
>

It is indeed different from GNU, but I think our behavior makes sense and I
don't think that's a bad difference. LLD's symbol handling is already
pretty different in some area, particularly how we handle archive files and
--start-group/--end-group. Just like that, this is something we don't want
to aim the exact compatibility.

>
> On 08.10.2015 02:12, Rui Ueyama wrote:
>
> On Wed, Oct 7, 2015 at 10:28 AM, Igor Kudrin <ikudrin.dev at gmail.com>
> wrote:
>
>> They have quite a good explanation of the option's behavior and both ld
>> and gold produce the same result. The option should only change the
>> resolution target of the specified symbols and shouldn't do anything else;
>> otherwise, the generated result will be unpredictable. In particular, we
>> should neither change the name of a defined symbol nor remove it.
>>
>> OK, I can suggest a bit different approach. What if we add a field to the
>> Symbol class to redirect undefined symbol bodies to another Symbol? It'll
>> require a small update in resolve() and maybe in a few other places, but
>> the additional hash map will be gone. What do you think?
>>
>
> We may be overthinking about that. I don't think that the difference
> between LLD and GNU ld is going to be a real issue in real-world usage, and
> even it does, we can do whatever to fix that after we recognize the need.
> So I think something like http://reviews.llvm.org/D13528 suffices.
>
>
>>
>> On 07.10.2015 22:38, Rui Ueyama wrote:
>>
>> We are not necessarily aim 100% precise compatibility with GNU ld in
>> every detail. GNU ld may have implemented this feature in a way that's
>> natural to them. I'd really want to implement this feature in a way as the
>> LLD architecture is designed for. This --wrap option is by nature a bit
>> hacky option, and users need to understand what they are doing. GNU ld
>> manual does not mention about the details about how this option is supposed
>> to work.
>>
>> On Wed, Oct 7, 2015 at 9:12 AM, Igor Kudrin < <ikudrin.dev at gmail.com>
>> ikudrin.dev at gmail.com> wrote:
>>
>>>
>>>
>>> On 07.10.2015 20:22, Rui Ueyama wrote:
>>>
>>> On Wed, Oct 7, 2015 at 6:59 AM, Igor Kudrin < <ikudrin.dev at gmail.com>
>>> ikudrin.dev at gmail.com> wrote:
>>>
>>>> ikudrin added a comment.
>>>>
>>>> In <http://reviews.llvm.org/D13501#261600>
>>>> http://reviews.llvm.org/D13501#261600, @ruiu wrote:
>>>>
>>>> > This patch needs redesigning because we don't want to look up hash
>>>> tables more than once for each symbol. In this patch, names for undefined
>>>> symbols are looked up twice -- once in the InputFile.cpp and the other in
>>>> SymbolTable.cpp.
>>>>
>>>>
>>>> We have to look up for different names for defined and undefined
>>>> symbols, so we can't use just one hash map.
>>>>
>>>> In most cases, when the -wrap switch is not used and
>>>> UndefSymNameReplacement is empty, addition lookup will be very cheap,
>>>> without calculating a hash value at all. On the other hand, we can use just
>>>> std::map which expected to work really quick in our case.
>>>>
>>>>
>>> No, we don't have to look up hash table more than once. I thought a bit
>>> more about this and noticed that the LLD architecture (the separation of
>>> Symbol and SymbolBody) would really play well here. Symbols are just
>>> pointers to SymbolBodies, and symbol renaming is as easy as single pointer
>>> mutation. Here's the idea.
>>>
>>> First we resolve all symbols without considering --wrap option (so no
>>> overhead for --wrap during file parsing and symbol resolution). When symbol
>>> resolution is done, we basically have three Symbols for --wrap'ed symbol S
>>> in the symbol table: S, __wrap_S, and __real_S. Let X, Y and Z be their
>>> SymbolBodies, respectively.
>>>
>>> Originally the relationships are
>>>
>>>   S -> X
>>>   __wrap_S -> Y
>>>   __real_S -> Z
>>>
>>> We swap the pointers so that their relationships are
>>>
>>>   S -> Y
>>>   __wrap_S -> Y
>>>   __real_S -> X
>>>
>>> Now you can see that all symbols that would have been resolved to S
>>> without --wrap are now resolved as if they were __wrap_S. __real_S is also
>>> properly resolved to the symbol that was once the real symbol pointed to.
>>> This is the same result as what --wrap expects.
>>>
>>>
>>> Thank you for explaining your idea. It can work in some cases, but I
>>> dread it's unable to cover all situations.
>>>
>>> 1) Suppose we have all three symbols defined. The command like "ld.lld2
>>> a.o -wrap=S" in your case will eliminate Z and change content for symbols S
>>> and __real_S (see SymbolTableSection<ELFT>::writeGlobalSymbols). That's not
>>> right, GNU's ld preserves symbols in this case.
>>>
>>> 2) Suppose that S and __wrap_S are defined in different object files,
>>> which lay in archives. With switch "-wrap=S" (and if we have a reference to
>>> S) we should not get the content of the first object file at all, but how
>>> can we avoid parsing it if there is the record 'S' in the SymbolTable?
>>>
>>> 3) If we generate DSO and have an unresolved undefined symbol S we have
>>> to rename it and store as __wrap_S. It'll require more complicated update
>>> of symbols in the symbol table to support this case.
>>>
>>> Finally, my approach, I hope, preserves the original idea of
>>> relationships in the symbol table. With my case, we will have links like
>>> these:
>>> (Body for undefined S, name is "__wrap_S") -> (Symbol with name
>>> "__wrap_S") -> (Body for defined symbol "__wrap_S")
>>> (Body for defined __wrap_S) -> (Symbol with name "__wrap_S") -> (Body
>>> for defined symbol "__wrap_S")
>>> (Body for defined S) -> (Symbol with name "S") -> (Body for defined
>>> symbol "S")
>>>
>>> With your proposal we'll end up with links like:
>>> (Body for undefined S, name is "S") -> (Symbol with name "S") -> (Body
>>> for defined symbol "__wrap_S")
>>> (Body for defined __wrap_S) -> (Symbol with name "__wrap_S") -> (Body
>>> for defined symbol "__wrap_S")
>>> (Body for defined S) -> (Symbol with name "S") -> (Body for defined
>>> symbol "__wrap_S")
>>>
>>> The last line is a bit unexpected and can be a source for unwitting bugs.
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151008/59122058/attachment.html>