[PATCH] D13501: [ELF2] Add -wrap switch

Thu Oct 8 21:33:59 PDT 2015

On 09.10.2015 4:08, Rui Ueyama wrote:
> On Wed, Oct 7, 2015 at 10:55 PM, Igor Kudrin <ikudrin.dev at gmail.com 
> <mailto:ikudrin.dev at gmail.com>> wrote:
>
>     I've added some examples to show the differences between your
>     patch and GNU ld's behavior, see http://reviews.llvm.org/D13528.
>     For me, it's not just the different way to implement the switch,
>     but mostly change its meaning.
>
>
> It is indeed different from GNU, but I think our behavior makes sense 
> and I don't think that's a bad difference. LLD's symbol handling is 
> already pretty different in some area, particularly how we handle 
> archive files and --start-group/--end-group. Just like that, this is 
> something we don't want to aim the exact compatibility.

Any unnecessary difference may lead to unexpected problems for end 
users. According to this, I don't feel like accepting your approach, 
which, compared to ld's behavior, is different in almost all aspects I 
can imagine, whereas my original patch follows ld's and gold's conduct 
pretty well. Maybe it's better not implement this switch at all.

Concerning the symbol table, the new mechanic of symbol handling is 
indeed different, but it implies the same rules of symbol resolution and 
seems like it can imitate GNU's behavior someday, even with 
--(start|end)-group.

>
>
>     On 08.10.2015 02:12, Rui Ueyama wrote:
>>     On Wed, Oct 7, 2015 at 10:28 AM, Igor Kudrin
>>     <ikudrin.dev at gmail.com <mailto:ikudrin.dev at gmail.com>> wrote:
>>
>>         They have quite a good explanation of the option's behavior
>>         and both ld and gold produce the same result. The option
>>         should only change the resolution target of the specified
>>         symbols and shouldn't do anything else; otherwise, the
>>         generated result will be unpredictable. In particular, we
>>         should neither change the name of a defined symbol nor remove it.
>>
>>         OK, I can suggest a bit different approach. What if we add a
>>         field to the Symbol class to redirect undefined symbol bodies
>>         to another Symbol? It'll require a small update in resolve()
>>         and maybe in a few other places, but the additional hash map
>>         will be gone. What do you think?
>>
>>
>>     We may be overthinking about that. I don't think that the
>>     difference between LLD and GNU ld is going to be a real issue in
>>     real-world usage, and even it does, we can do whatever to fix
>>     that after we recognize the need. So I think something like
>>     http://reviews.llvm.org/D13528 suffices.
>>
>>
>>         On 07.10.2015 22:38, Rui Ueyama wrote:
>>>         We are not necessarily aim 100% precise compatibility with
>>>         GNU ld in every detail. GNU ld may have implemented this
>>>         feature in a way that's natural to them. I'd really want to
>>>         implement this feature in a way as the LLD architecture is
>>>         designed for. This --wrap option is by nature a bit hacky
>>>         option, and users need to understand what they are doing.
>>>         GNU ld manual does not mention about the details about how
>>>         this option is supposed to work.
>>>
>>>         On Wed, Oct 7, 2015 at 9:12 AM, Igor Kudrin
>>>         <ikudrin.dev at gmail.com <mailto:ikudrin.dev at gmail.com>> wrote:
>>>
>>>
>>>
>>>             On 07.10.2015 20:22, Rui Ueyama wrote:
>>>>             On Wed, Oct 7, 2015 at 6:59 AM, Igor Kudrin
>>>>             <ikudrin.dev at gmail.com <mailto:ikudrin.dev at gmail.com>>
>>>>             wrote:
>>>>
>>>>                 ikudrin added a comment.
>>>>
>>>>                 In http://reviews.llvm.org/D13501#261600, @ruiu wrote:
>>>>
>>>>                 > This patch needs redesigning because we don't
>>>>                 want to look up hash tables more than once for each
>>>>                 symbol. In this patch, names for undefined symbols
>>>>                 are looked up twice -- once in the InputFile.cpp
>>>>                 and the other in SymbolTable.cpp.
>>>>
>>>>
>>>>                 We have to look up for different names for defined
>>>>                 and undefined symbols, so we can't use just one
>>>>                 hash map.
>>>>
>>>>                 In most cases, when the -wrap switch is not used
>>>>                 and UndefSymNameReplacement is empty, addition
>>>>                 lookup will be very cheap, without calculating a
>>>>                 hash value at all. On the other hand, we can use
>>>>                 just std::map which expected to work really quick
>>>>                 in our case.
>>>>
>>>>
>>>>             No, we don't have to look up hash table more than once.
>>>>             I thought a bit more about this and noticed that the
>>>>             LLD architecture (the separation of Symbol and
>>>>             SymbolBody) would really play well here. Symbols are
>>>>             just pointers to SymbolBodies, and symbol renaming is
>>>>             as easy as single pointer mutation. Here's the idea.
>>>>
>>>>             First we resolve all symbols without considering --wrap
>>>>             option (so no overhead for --wrap during file parsing
>>>>             and symbol resolution). When symbol resolution is done,
>>>>             we basically have three Symbols for --wrap'ed symbol S
>>>>             in the symbol table: S, __wrap_S, and __real_S. Let X,
>>>>             Y and Z be their SymbolBodies, respectively.
>>>>
>>>>             Originally the relationships are
>>>>
>>>>               S -> X
>>>>             __wrap_S -> Y
>>>>             __real_S -> Z
>>>>
>>>>             We swap the pointers so that their relationships are
>>>>
>>>>               S -> Y
>>>>             __wrap_S -> Y
>>>>             __real_S -> X
>>>>
>>>>             Now you can see that all symbols that would have been
>>>>             resolved to S without --wrap are now resolved as if
>>>>             they were __wrap_S. __real_S is also properly resolved
>>>>             to the symbol that was once the real symbol pointed to.
>>>>             This is the same result as what --wrap expects.
>>>
>>>             Thank you for explaining your idea. It can work in some
>>>             cases, but I dread it's unable to cover all situations.
>>>
>>>             1) Suppose we have all three symbols defined. The
>>>             command like "ld.lld2 a.o -wrap=S" in your case will
>>>             eliminate Z and change content for symbols S and
>>>             __real_S (see
>>>             SymbolTableSection<ELFT>::writeGlobalSymbols). That's
>>>             not right, GNU's ld preserves symbols in this case.
>>>
>>>             2) Suppose that S and __wrap_S are defined in different
>>>             object files, which lay in archives. With switch
>>>             "-wrap=S" (and if we have a reference to S) we should
>>>             not get the content of the first object file at all, but
>>>             how can we avoid parsing it if there is the record 'S'
>>>             in the SymbolTable?
>>>
>>>             3) If we generate DSO and have an unresolved undefined
>>>             symbol S we have to rename it and store as __wrap_S.
>>>             It'll require more complicated update of symbols in the
>>>             symbol table to support this case.
>>>
>>>             Finally, my approach, I hope, preserves the original
>>>             idea of relationships in the symbol table. With my case,
>>>             we will have links like these:
>>>             (Body for undefined S, name is "__wrap_S") -> (Symbol
>>>             with name "__wrap_S") -> (Body for defined symbol
>>>             "__wrap_S")
>>>             (Body for defined __wrap_S) -> (Symbol with name
>>>             "__wrap_S") -> (Body for defined symbol "__wrap_S")
>>>             (Body for defined S) -> (Symbol with name "S") -> (Body
>>>             for defined symbol "S")
>>>
>>>             With your proposal we'll end up with links like:
>>>             (Body for undefined S, name is "S") -> (Symbol with name
>>>             "S") -> (Body for defined symbol "__wrap_S")
>>>             (Body for defined __wrap_S) -> (Symbol with name
>>>             "__wrap_S") -> (Body for defined symbol "__wrap_S")
>>>             (Body for defined S) -> (Symbol with name "S") -> (Body
>>>             for defined symbol "__wrap_S")
>>>
>>>             The last line is a bit unexpected and can be a source
>>>             for unwitting bugs.
>>>
>>>
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151009/94499729/attachment.html>