[llvm-dev] Weak symbol/alias semantics

Wed Jan 18 11:54:12 PST 2017

On Wed, Jan 18, 2017 at 6:54 AM, Teresa Johnson <tejohnson at google.com>
wrote:

>
>
> On Tue, Jan 17, 2017 at 11:05 AM, Peter Collingbourne <peter at pcc.me.uk>
> wrote:
>
>>
>>
>> On Fri, Jan 13, 2017 at 4:53 PM, Teresa Johnson <tejohnson at google.com>
>> wrote:
>>
>>> Thanks, David and Peter. Some responses to Peter's email below. Teresa
>>>
>>> On Fri, Jan 13, 2017 at 3:21 PM, Peter Collingbourne <peter at pcc.me.uk>
>>> wrote:
>>>
>>>> Hi Teresa,
>>>>
>>>> I think that to answer your question correctly it is helpful to
>>>> consider what is going on at the object file level. For your test1.c we
>>>> conceptually have a .text section containing the body of f, and then three
>>>> symbols:
>>>>
>>>> .weak f
>>>> f = .text
>>>> .globl strongalias
>>>> strongalias = .text
>>>> .weak weakalias
>>>> weakalias = .text
>>>>
>>>> Note that f, strongalias and weakalias are not related at all, except
>>>> that they happen to point to the same place. If f is overridden by a symbol
>>>> in another object file, it does not affect the symbols strongalias and
>>>> weakalias, so we still need to make them point to .text. I don't think it
>>>> would be right to make strongalias and weakalias into copies of f, as that
>>>> would be observable through function pointer equality. Most likely all you
>>>> need to do is to internalize f and keep strongalias and weakalias as
>>>> aliases of f.
>>>>
>>>
>>> Good point on wanting function pointer equality.  However, we can't
>>> simply internalize f(). We'll also need to rename the internalized copy.
>>> The reason is that we want the original f() references to resolve to the
>>> prevailing copy in the other module.Summarizing what we just talked about
>>> on IRC, when we have a non-prevailing weak/linkonce symbol f() that has an
>>> alias point to it, we need to:
>>> 1) Rename and internalize f()
>>> 2) Create a new external decl f()
>>> 3) RAUW existing references (other than from the aliases) with the new
>>> local created in 1)
>>>
>>> I think if it is however a weak_odr/linkonce_odr we can simplify the
>>> process since all copies will be the same. We can make f()
>>> available_externally (to enable inlining), and simply convert references to
>>> aliases of f() into direct references to f() and drop the aliases - does
>>> that sound right?
>>>
>>
>> I think you are right about the _odr -- we should assume that even if the
>> symbol as we see it is an alias it will be replaced with something with the
>> same semantics. I think we can also take that into account in the logic for
>> replacing the alias with its aliasee, as follows:
>>
>> - if the alias is internal, strong external or odr (i.e. not
>> isInterposableLinkage): may replace the global with an available_externally
>> copy of its aliasee
>>
>
> That doesn't match what we decided for aliases to a weak symbol (e.g. the
> strongalias case from above). I thought we decided that these should end up
> aliased to an internalized copy of its aliasee (assuming we're still
> talking about the aliasee being weak and non-prevailing). If alias becomes
> instead an available_externally copy, what happens when that copy is
> eliminated - there may not be an external def to resolve the references to
> at link time?
>

Sorry, I was talking about the case where the alias and aliasee are
non-prevailing, or where the symbol is being imported. In the module itself
we will do as you mentioned.

>
>
>> - if both the alias and aliasee are not isInterposableLinkage: may
>> replace the global reference with a reference to the aliasee
>>
>
> What if alias and aliasee are both non-prevailing and the prevailing defs
> for each are different? E.g.
>
> @x = weak global ...
> @y = weak alias @x
>

>
and the prevailing def for @x is in moduleX with a different value than the
> prevailing def for @y which comes from moduleY. Just because they are
> aliased in this module doesn't mean they must be aliased elsewhere, right?
>

Right, in that case both x and y are isInterposableLinkage so this
transformation would not be valid.

>

> For this case (weak non-prevailing alias to a weak non-prevailing def) I
> think it should eventually become:
>
> @x = external global
> @y = external global
>
> which is what we would get as I proposed:
> - first by following the above transformation to make @y alias with a
> renamed and internalized @x, converting @x to an external decl
> - second by following case c) further down (since @y now aliases with a
> strong symbol), converting @y to an external decl
>

Agreed. I was thinking more about this case during the regular (not
necessarily ThinLTO) opt pipeline:

@x = weak_odr global ...
@y = weak_odr alias @x

In that case I was thinking that we know from _odr that y is an alias of x
in any translation unit defining y, so we can replace y with x. This may
not be useful after we start using the canonical form discussed elsewhere
though. (And now that I think about it some more it may not be a valid
transformation if the definition of x prevailed from an object defining
just x and the definition of y prevailed from an object defining both x and
y.)

>
>>
>>  Another tricky thing is if the weak symbol was a variable that is
>>> initialized via a __cxx_global_var_init function in the global_ctors list.
>>> If we have an alias to that symbol, presumably we'll want the new
>>> internalized/renamed version to get initialized instead?
>>>
>>
>> I believe that the purpose of the global reference in global_ctors is to
>> control which comdat the .init_array entry appears in. So yes, we will need
>> to rewrite the reference in global_ctors to point to the renamed global.
>> However, I think we also need to think about whether to remove the
>> init_array entry entirely. There are a couple of cases:
>>
>> 1) All symbols in the comdat are weak and have been overridden by strong
>> symbols in another object file, which may not necessarily be in a comdat.
>> In that case we need to keep the init_array entry so that we call the
>> initializer function.
>> 2) The linker has selected another comdat entirely. That means that this
>> object file's init_array entry has not been chosen and we need to drop it.
>>
>
> Related: D28737 ([ThinLTO] Don't create a comdat group for a dropped def
> with initializer)
>
>>
>> The interesting thing about these two cases is that they are
>> indistinguishable at the LTO API level because the linker will report all
>> of the comdat symbols as non-prevailing.
>>
>> I don't think it is possible to arrive at case 1 with regular C++, so
>> maybe we'd be fine assuming case 2 if all comdat symbols are
>> non-prevailing. But then again it would seem to make the implementation
>> simpler if we communicate which comdat has prevailed at the LTO API level.
>>
>
> Yes another thing I realized is that we will drop the comdat for a
> non-prevailing weak that we convert to available_externally (or to a decl
> with D28806: [ThinLTO] Drop non-prevailing non-ODR weak to declarations),
> but we won't drop any other members of the same comdat group from the
> comdat. E.g. the comdat could contain GVs with internal or strong external
> linkage. That also needs to be fixed, so we don't end up with incomplete
> comdat groups.
>

I think we can emit partial comdat groups if we know that the linker will
pick our comdat group (trivially true if there is only one in the native
objects we emit, but need to be careful in the distributed case) and the
symbols in the comdat group will be defined by other objects in the link.
This goes back to the discussion we had about everything in the final
native objects needing to be prevailing and relying on linker-specific
semantics to ensure the same resolution in the distributed case.

We can deduce that the comdat is not selected by the linker when it
> contains a weak symbol, since we know whether that is prevailing or not,
> but not when a comdat doesn't contain any weak. In that latter case (comdat
> doesn't have any weak symbols), I haven't thought through where this could
> get us into trouble, since we don't typically drop any non-weak symbols in
> ThinLTO compilation (so we wouldn't end up with an incomplete comdat
> group), but I wonder if there are potential issues especially in the
> distributed case where we do 2 separate links.
>

I don't get your point. Surely if a symbol is strong and non-prevailing the
only possibility is that the comdat was not chosen (i.e. if the comdat was
chosen, either the symbol would win or there would be a duplicate symbol
error).

Peter

>
>
>> Now in the case where we have an alias that is itself a weak
>>> non-prevailing symbol, how we handle will I think depend on what it is
>>> aliased to:
>>> a) aliased to a weak/linkonce non-prevailing symbol -> handle as
>>> described earlier
>>> b) aliased to a weak_odr/linkonce_odr non-prevailing symbol -> handle as
>>> described earlier
>>> c) aliased to a strong symbol or a prevailing symbol -> convert to
>>> external decl (I think this case is only possible if the alias is a non-odr
>>> weak/linkonce)
>>>
>>> Does that sound right?
>>>
>>
>> I think the logic just needs to depend on the linkage of the alias
>> itself, as I described above.
>>
>
> See my comment on this above.
>
> Thanks,
> Teresa
>
>
>>
>> If we're resolving strongalias to f at -O2, that seems like a bug to me.
>>>> We can probably only resolve an alias to the symbol it references if we are
>>>> guaranteed that both symbols will have the same resolution, i.e. we must
>>>> check at least that both symbols have strong or internal linkage. If we
>>>> cared about symbol interposition, we might also want to check that both
>>>> symbols have non-default visibility, but I think that our support for that
>>>> is still a little fuzzy at the moment.
>>>>
>>>
>>> Per your and David's analysis it sounds like this is a bug then - I can
>>> file a bug to track it with the example.
>>>
>>> Regarding the comdat case I mentioned - Peter and I discussed on IRC and
>>> he pointed out that my case was illegal since aliases are by definition in
>>> the same comdat group as the symbol they alias. So in effect I had an
>>> incomplete comdat group.
>>>
>>> Thanks,
>>> Teresa
>>>
>>>
>>>> Thanks,
>>>> Peter
>>>>
>>>> On Fri, Jan 13, 2017 at 2:33 PM, Teresa Johnson <tejohnson at google.com>
>>>> wrote:
>>>>
>>>>> Hi Mehdi, Peter and David (and anyone else who sees this),
>>>>>
>>>>> I've been playing with some examples to handle the weak symbol cases
>>>>> we discussed in IRC earlier this week in the context of D28523. I was going
>>>>> to implement the support for turning aliases into copies in order to enable
>>>>> performing thinLTOResolveWeakForLinkerGUID on both aliases and
>>>>> aliasees, as a first step to being able to drop non-prevailing weak symbols
>>>>> in ThinLTO backends.
>>>>>
>>>>> I was wondering though what happens if we have an alias, which may or
>>>>> may not be weak itself, to a non-odr weak symbol that isn't prevailing. In
>>>>> that case, do we eventually want references via the alias to go to the
>>>>> prevailing copy (in another module), or to the original copy in the alias's
>>>>> module? I looked at some examples without ThinLTO, and am a little
>>>>> confused. Current (non-ThinLTO) behavior in some cases seems to depend on
>>>>> opt level.
>>>>>
>>>>> Example:
>>>>>
>>>>> $ cat weak12main.c
>>>>> extern void test2();
>>>>> int main() {
>>>>>   test2();
>>>>> }
>>>>>
>>>>> $ cat weak1.c
>>>>> #include <stdio.h>
>>>>>
>>>>> void weakalias() __attribute__((weak, alias ("f")));
>>>>> void strongalias() __attribute__((alias ("f")));
>>>>>
>>>>> void f () __attribute__ ((weak));
>>>>> void f()
>>>>> {
>>>>>   printf("In weak1.c:f\n");
>>>>> }
>>>>> void test1() {
>>>>>   printf("Call f() from weak1.c:\n");
>>>>>   f();
>>>>>   printf("Call weakalias() from weak1.c:\n");
>>>>>   weakalias();
>>>>>   printf("Call strongalias() from weak1.c:\n");
>>>>>   strongalias();
>>>>> }
>>>>>
>>>>> $ cat weak2.c
>>>>> #include <stdio.h>
>>>>>
>>>>> void f () __attribute__ ((weak));
>>>>> void f()
>>>>> {
>>>>>   printf("In weak2.c:f\n");
>>>>> }
>>>>> extern void test1();
>>>>> void test2()
>>>>> {
>>>>>   test1();
>>>>>   printf("Call f() from weak2.c\n");
>>>>>   f();
>>>>> }
>>>>>
>>>>> If I link weak1.c before weak2.c, nothing is surprising (we always
>>>>> invoke weak1.c:f at both -O0 and -O2):
>>>>>
>>>>> $ clang weak12main.c weak1.c weak2.c -O0
>>>>> $ a.out
>>>>> Call f() from weak1.c:
>>>>> In weak1.c:f
>>>>> Call weakalias() from weak1.c:
>>>>> In weak1.c:f
>>>>> Call strongalias() from weak1.c:
>>>>> In weak1.c:f
>>>>> Call f() from weak2.c
>>>>> In weak1.c:f
>>>>>
>>>>> $ clang weak12main.c weak1.c weak2.c -O2
>>>>> $ a.out
>>>>> Call f() from weak1.c:
>>>>> In weak1.c:f
>>>>> Call weakalias() from weak1.c:
>>>>> In weak1.c:f
>>>>> Call strongalias() from weak1.c:
>>>>> In weak1.c:f
>>>>> Call f() from weak2.c
>>>>> In weak1.c:f
>>>>>
>>>>> If I instead link weak2.c first, so it's copy of f() is prevailing, I
>>>>> still get weak1.c:f for the call via weakalias() (both opt levels), and for
>>>>> strongalias() when building at -O0. At -O2 the compiler replaces the call
>>>>> to strongalias() with a call to f(), so it get's the weak2 copy in that
>>>>> case.
>>>>>
>>>>> $ clang weak12main.c weak2.c weak1.c -O2
>>>>> $ a.out
>>>>> Call f() from weak1.c:
>>>>> In weak2.c:f
>>>>> Call weakalias() from weak1.c:
>>>>> In weak1.c:f
>>>>> Call strongalias() from weak1.c:
>>>>> In weak2.c:f
>>>>> Call f() from weak2.c
>>>>> In weak2.c:f
>>>>>
>>>>> $ clang weak12main.c weak2.c weak1.c -O0
>>>>> $ a.out
>>>>> Call f() from weak1.c:
>>>>> In weak2.c:f
>>>>> Call weakalias() from weak1.c:
>>>>> In weak1.c:f
>>>>> Call strongalias() from weak1.c:
>>>>> In weak1.c:f
>>>>> Call f() from weak2.c
>>>>> In weak2.c:f
>>>>>
>>>>> I'm wondering what the expected/correct behavior is? Depending on what
>>>>> is correct, we need to handle this differently in ThinLTO mode. Let's say
>>>>> weak1.c's copy of f() is not prevailing and I am going to drop it (it needs
>>>>> to be removed completely, not turned into available_externally to ensure it
>>>>> isn't inlined since weak isInterposable). If we want the aliases in weak1.c
>>>>> to reference the original version, then copying is correct (e.g. weakalias
>>>>> and strong alias would each become a copy of weak1.c's f()). If we however
>>>>> want them to resolve to the prevailing copy of f(), then we need to turn
>>>>> the aliases into declarations (external linkage in the case of strongalias
>>>>> and external weak in the case of weakalias?).
>>>>>
>>>>> I also tried the case where f() was in a comdat, because I also need
>>>>> to handle that case in ThinLTO (when f() is not prevailing, drop it from
>>>>> the comdat and remove the comdat from that module). Interestingly, in this
>>>>> case when weak2.c is prevailing, I get the following warning when linking
>>>>> and get a seg fault at runtime:
>>>>>
>>>>> weak1.o:weak1.o:function test1: warning: relocation refers to
>>>>> discarded section
>>>>>
>>>>> Presumably the aliases still refer to the copy in weak1.c, which is in
>>>>> the comdat that gets dropped by the linker. So is it not legal to have an
>>>>> alias to a weak symbol in a comdat (i.e. alias from outside the comdat)? We
>>>>> don't complain in the compiler.
>>>>>
>>>>> Thanks,
>>>>> Teresa
>>>>> --
>>>>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>>>> 408-460-2413 <(408)%20460-2413>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> --
>>>> Peter
>>>>
>>>
>>>
>>>
>>> --
>>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>> 408-460-2413 <(408)%20460-2413>
>>>
>>
>>
>>
>> --
>> --
>> Peter
>>
>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413 <(408)%20460-2413>
>

-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170118/3eb155bd/attachment.html>