[PATCH] D59709: [ThinLTO] Auto-hide prevailing linkonce_odr only when all copies eligible

Fri Sep 27 16:53:25 PDT 2019

tejohnson marked an inline comment as done.
tejohnson added inline comments.

================
Comment at: llvm/trunk/lib/LTO/LTO.cpp:334
+        S->setCanAutoHide(VI.canAutoHide() &&
+                          !GUIDPreservedSymbols.count(VI.getGUID()));
+      }
----------------
steven_wu wrote:
> tejohnson wrote:
> > steven_wu wrote:
> > > tejohnson wrote:
> > > > steven_wu wrote:
> > > > > tejohnson wrote:
> > > > > > tejohnson wrote:
> > > > > > > steven_wu wrote:
> > > > > > > > The regression I saw is exactly the situation mentioned in the comments but I am not so sure about the conclusion here. For ld64, it can see whether a symbol can be auto hide or not, and it prefers the one that is not auto hide. This means if there are other copies of the symbol that is not autohide outside the summary and not autohide, the one can be autohide should never be prevailing.
> > > > > > > > 
> > > > > > > > Is that the case for other linker?
> > > > > > > > The regression I saw is exactly the situation mentioned in the comments but I am not so sure about the conclusion here.
> > > > > > > 
> > > > > > > By the situation mentioned in the comments I assume you mean the part about symbols in the GUIDPreservedSymbols set because they are visible outside the summary? In your situation, why are there so many symbols in the preserved symbols set? Are there a lot of symbols in native code being linked in?
> > > > > > > 
> > > > > > > I guess it is different in lld since in the case I encountered (in an internal application), the linkonce_odr copy was prevailing, not the weak_odr copy. If ld64 guarantees that the autohide (linkonce_odr) copy will never be prevailing if there is an autohide (weak_odr) copy available somewhere, then we may need to figure out a way to communicate that.
> > > > > > > If ld64 guarantees that the autohide (linkonce_odr) copy will never be prevailing if there is an autohide (weak_odr) copy available somewhere
> > > > > > 
> > > > > > I meant if there is a *non*-autohide (weak_odr) copy available of course...
> > > > > > By the situation mentioned in the comments I assume you mean the part about symbols in the GUIDPreservedSymbols set because they are visible outside the summary? In your situation, why are there so many symbols in the preserved symbols set? Are there a lot of symbols in native code being linked in?
> > > > > > 
> > > > > 
> > > > > Yes. The situation is the code is linking a static c++ library (native code) with linkonce autohide libcxx symbols. Since ld64 decides to coalesce away the copy in the native code and use the version inside the LTO, it has to add it to mustPreserve symbols. Now even both copies can be autohide, it is not autohide because it is in the preserve list.
> > > > > 
> > > > > 
> > > > > > I guess it is different in lld since in the case I encountered (in an internal application), the linkonce_odr copy was prevailing, not the weak_odr copy. If ld64 guarantees that the autohide (linkonce_odr) copy will never be prevailing if there is an autohide (weak_odr) copy available somewhere, then we may need to figure out a way to communicate that.
> > > > > > 
> > > > > 
> > > > > This sounds like a linker bug but I am not familiar with lld LTO pipeline to be sure. Should lld mark weak_odr copy as prevailing? I also would like to understand how lld is using mustPreserve symbols because I thought lld doesn't need it.
> > > > > For ld64, it uses mustPreserve symbol to communicate if the weak/linkonce symbols are prevailing in this case.
> > > > > 
> > > > In the new LTO API, symbols are added to the GUIDPreservedSymbols set if they are visible outside the summary (either in a bitcode file without a summary or a native object).
> > > > 
> > > > We could do better if the linker indicated whether these symbols in objects without summaries were eligible for auto hide, which in the compiler is based both on the linkage type and on it having the global unnamed addr flag - does the linker know both of these for  native objects?
> > > > 
> > > > I'm not an lld expert, but looking back at my original internal discussion with @pcc on this issue, it appears to be a difference in ld64 (Mach-O) semantics and ELF semantics. Here is an excerpt:
> > > > 
> > > > pcc wrote:
> > > > > tejohnson wrote:
> > > > >> Is there some other criteria that should be used to prevent the symbol from being marked Hidden in this case?
> > > > >
> > > > > It seems that the rule should be: if we see at least one weak_odr definition (i.e. explicit instantiation) that should prevent the relaxation from default to hidden. The reason is that the weak_odr makes it possible to have another translation unit that does not implicitly instantiate the function, including a translation unit in another DSO. Similar logic is implemented in ld64: if one symbol is weak_odr unnamed_addr (i.e. non-auto-hide) and the other is linkonce_odr unnamed_addr (i.e. auto-hide), the linker selects the non-auto-hide one.
> > > > > https://github.com/apple-opensource/ld64/blob/master/src/ld/SymbolTable.cpp#L228
> > > > >
> > > > > Unfortunately, this is at odds with ELF semantics where the "most hidden" visibility wins, so I think this means that we would not be able to mark linkonce_odr unnamed_addr symbols as hidden in the non-LTO case, as was suggested at one point on llvm-dev I believe.
> > > > 
> > > > 
> > > > So perhaps the best thing to do in this case is have an interface for the linker to indicate the correct semantics with regard to whether most or least hidden wins?
> > > When you say visible outside the summary, what exactly is the definition of `visible`? I failed the find the relevant in lld that interacts with GUIDPreservedSymbols. For ld64, weak symbols are only in the GUIDPreservedSymbols when it is in the final image, so I am not sure if we share the same definition. If the weak symbols are `visible` outside the summary just to get coalesced away later, it doesn't have to be preserved.
> > > 
> > > I remember there were few other semantics difference in symbol resolution rules. IRLinker follows ELF visibility rules as well. It might be a good idea to clean that up sometimes.
> > > 
> > > If the ELF is choosing the most hidden version, why is the case described in the commit message a problem? Is it because weak_odr beats linkonce_odr before visibility comes into consideration?
> > > When you say visible outside the summary, what exactly is the definition of visible? I failed the find the relevant in lld that interacts with GUIDPreservedSymbols. For ld64, weak symbols are only in the GUIDPreservedSymbols when it is in the final image, so I am not sure if we share the same definition. If the weak symbols are visible outside the summary just to get coalesced away later, it doesn't have to be preserved.
> > 
> > lld doesn't create this set directly, it is done in the new LTO API here:
> > http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#910
> > 
> > And VisibleOutsideSummary is set here:
> > http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#539
> > 
> > VisibleToRegularObj is set by lld if the symbol is used in a regular (native) object
> > InSummary is true of the bitcode object had a summary
> > 
> > > If the ELF is choosing the most hidden version, why is the case described in the commit message a problem? Is it because weak_odr beats linkonce_odr before visibility comes into consideration?
> > 
> > ELF is picking the linkonce_odr symbol (most hidden) as prevailing. If you look at the change in FunctionImport.cpp, before this patch we were unconditionally marking the prevailing linkonce_odr copy as hidden when "promoting" to weak_odr. 
> > 
> > In the case that was failing we were first linking into a (native) shared library. The translation units in the shared library link contained both some implicit instantiations (linkonce_odr) and an explicit instantiation (weak_odr). Without LTO, the weak_odr would have resulted in a non-hidden weak definition in the shared library. Then this shared library was linked with code that contained references to this symbol that were expecting to be resolved by the explicit instantiation. If we instead mark that symbol as hidden in the shared library, those references are no longer satisfied.
> > 
> > ELF is picking the linkonce_odr symbol (most hidden) as prevailing. If you look at the change in FunctionImport.cpp, before this patch we were unconditionally marking the prevailing linkonce_odr copy as hidden when "promoting" to weak_odr.
> > 
> > In the case that was failing we were first linking into a (native) shared library. The translation units in the shared library link contained both some implicit instantiations (linkonce_odr) and an explicit instantiation (weak_odr). Without LTO, the weak_odr would have resulted in a non-hidden weak definition in the shared library. Then this shared library was linked with code that contained references to this symbol that were expecting to be resolved by the explicit instantiation. If we instead mark that symbol as hidden in the shared library, those references are no longer satisfied.
> > 
> 
> I guess I was missing something here for how ELF handles weak links during static link time. It sounds it is correct the the linkonce_odr version is prevailing under ELF rule but I don't understand the reason why the weak_odr version is expected to be exported?
> 
> Here is my understanding for ELF linking and let me know if it is correct or not.
> 
> For linking without LTO:
> * There is no bits in ELF object file to indicate auto hide. linkonce_odr and weak_odr are both weak external symbols so you get a weak copy in the dylib.
> 
> For linking with fullLTO:
> * IRLinker will pick weak_odr linkage, remove unamed_addr, resulting the same output as native
> 
> For linking with thinLTO (no native or fullLTO):
> * autohide disabled because weak_odr is cannot autohide, weak external symbol produced.
> Since every situation produces weak external symbol, would it be easier just make lld mark weak_odr as prevailing (can you tell linkonce_odr from weak_odr in ELF object)?
> 
> Because native object for ELF doesn't have autohide optimization, missing autohide during thinLTO is not a regression.
Sorry I forgot I hadn't responded here. @pcc should be better able to answer some of your specific questions about ELF and lld.

> For linking with thinLTO (no native or fullLTO):
>   - autohide disabled because weak_odr is cannot autohide, weak external symbol produced.

This should also be the same as a non-LTO after the link completes - there is a non-hidden weak symbol in the library (the prevailing linkonce_odr that was "promoted" to weak_odr and now no longer marked hidden). 

> Since every situation produces weak external symbol, would it be easier just make lld mark weak_odr as prevailing (can you tell linkonce_odr from weak_odr in ELF object)?

I am not an expert on ELF semantics, deferring to @pcc here.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59709/new/

https://reviews.llvm.org/D59709