[PATCH] D114841: [lld-macho] Fix duplicate symbols with relocatable objects

Keith Smiley via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 28 18:46:07 PST 2022


keith marked 9 inline comments as done.
keith added inline comments.


================
Comment at: lld/MachO/Driver.cpp:353
   case file_magic::bitcode:
-    newFile = make<BitcodeFile>(mbref, "", 0, isLazy);
+    // See comments above for macho_object file type
+    if (InputFile *cachedFile = implicitlyLoadedObjects[path])
----------------
I'm loosely considering not supporting this just to reduce the complexity here in general, since I think this case is much less likely to happen, but I don't have a strong preference, happy to hear thoughts from others


================
Comment at: lld/MachO/Driver.cpp:253
                           bool isExplicit = true, bool isBundleLoader = false) {
+  // Avoid loading archives twice. If the archives are being force-loaded,
+  // loading them twice would create duplicate symbol errors. In the
----------------
oontvoo wrote:
> nit: no longer applying only to archives
I'm not sure what this was referring to, but I believe it was a comment I had in a previous iteration, let me know if you meant something else!


================
Comment at: lld/MachO/Driver.cpp:322-324
   case file_magic::macho_object:
-    newFile = make<ObjFile>(mbref, getModTime(path), "");
+    newFile = loadedObjects[path] = make<ObjFile>(mbref, getModTime(path), "");
     break;
----------------
int3 wrote:
> oontvoo wrote:
> > smeenai wrote:
> > > oontvoo wrote:
> > > > int3 wrote:
> > > > > oontvoo wrote:
> > > > > > int3 wrote:
> > > > > > > keith wrote:
> > > > > > > > int3 wrote:
> > > > > > > > > oontvoo wrote:
> > > > > > > > > > keith wrote:
> > > > > > > > > > > oontvoo wrote:
> > > > > > > > > > > > On a second thought, this is suspicious. Doesn't this mean `%lld foo.o foo.o ` wouldn't yield "duplicate symbols" error anymore?
> > > > > > > > > > > Yea I mentioned this with a top level comment as well, it does break the related test. I need to find a better heuristic for skipping these, since we only want to do it for LC_LINKER_OPTION's / duplicate -framework commands etc (not sure if the latter is handled today or not, especially as it conflicts with -framework foo -weak_framework foo etc).
> > > > > > > > > > > 
> > > > > > > > > > > I was hoping to do this specifically in the LC_LINKER_OPTION logic to avoid changing things here, but the issue is the order of hitting those options vs the command line options depends on the order things are passed, so I couldn't rely on that.
> > > > > > > > > > Ah, I've missed the previous comments. Sorry!
> > > > > > > > > > 
> > > > > > > > > > In any case, I can think of a contrived example where you have the input obj files (standalone or in archives) containing no symbols but contributing to global ctors. In that case, wouldn't we want the side effect of these being loaded individually? 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > I don't think it'd be terribly hard to handle this, we'd just need an extra boolean to `addFile` to indicate if a file is being loaded via `add{Framework,Library}` or if it's being loaded directly via a path. That said... I'm not sure there's much value in handling this edge case. I'm more than happy to just dedup everything and document this behavioral difference in ld64-vs-lld.rst.
> > > > > > > > > 
> > > > > > > > > > I can think of a contrived example where you have the input obj files (standalone or in archives) containing no symbols but contributing to global ctors
> > > > > > > > > 
> > > > > > > > > What does 'contributing to global ctors' mean exactly?
> > > > > > > > FWIW tensorflow is how I discovered this. They produce an object file in a framework and you get corresponding linker options loading it if you include it from swift. So our app doesn't link in that case. On the swift side there is a private flag to exclude the linker option, but that's not ideal
> > > > > > > > I'm not sure there's much value in handling this edge case
> > > > > > > 
> > > > > > > Just to be clear, I meant there's no much value in raising duplicate symbol errors when an object file is specified twice on the command line. We should definitely handle object files in frameworks that get loaded multiple times
> > > > > > > Just to be clear, I meant there's no much value in raising duplicate symbol errors when an object file is specified twice on the command line. 
> > > > > > 
> > > > > > Ok, agreed there's no value in raising duplicate error (that's just the side effect of not loading the file again)
> > > > > > 
> > > > > > > What does 'contributing to global ctors' mean exactly?
> > > > > > 
> > > > > > The "contrived" case I'm talking about is this:
> > > > > > 
> > > > > > ```
> > > > > > 
> > > > > > // main.c
> > > > > > #include <stdio.h>
> > > > > > void a_ctor() {
> > > > > >   // do stuff
> > > > > >   printf("ctor called\n");
> > > > > > }
> > > > > > 
> > > > > > int main() {return 0; }
> > > > > > 
> > > > > > // my_init.s
> > > > > > // This file defines no symbol - but it sticks the _a_ctor into the global ctors list
> > > > > > L_.str:  
> > > > > > 	.section	__DATA,__mod_init_func,mod_init_funcs
> > > > > > 	.p2align	3
> > > > > > 	.quad	_a_ctor
> > > > > > ```
> > > > > > 
> > > > > > Link it ` ld <other args> init.o init.o main.o`
> > > > > > 
> > > > > > Run it:
> > > > > > 
> > > > > > ```
> > > > > > $ ./my_main.out 
> > > > > > ctor called
> > > > > > ctor called
> > > > > > ```
> > > > > > 
> > > > > > With this patch, it'll only execute the ctor once
> > > > > > 
> > > > > > 
> > > > > oh right I see what you mean. yeah seems very edge-case-y...
> > > > In normal applications, this might be an edge case, yes, but it's not uncommon to see all kinds of creative usages of custom init (eg., the sanitizers do it all the time).
> > > > 
> > > > To think of it another way, if we were to propose this change to other ports (eg ELF), do you think they'd accept it? 
> > > > If yes, then ok 🤐 
> > > > If no, then why would MachO do it?  (it's a weird behaviour ... and tbh, I don't think they'd accept it )
> > > > 
> > > > But I've said my piece and it seems your position has the popular vote here :P so ...
> > > > 
> > > > ("this change" specifically being "if you see multiple identical input paths then, only look at one of them")
> > > I believe LLD COFF already deduplicates input files (but I also think they're just emulating Microsoft's linker's behavior there).
> > That's fair.
> > Feel free to revert to the simpler version. :)
> @oontvoo I think you make good points, but I also think that the rare builds relying on this can easily restructure themselves to not have to load the same file multiple times.
> 
> Moreover, LLD-ELF already deviates from bfd in several ways, and I'm not sure this additional discrepancy would be that egregious.
Sorry to resurrect this one, but I think with my current implementation of tracking implicit vs explicit, we get the best of both worlds here, I tested the ctors example above and it gets called twice as expected, but duplicate symbols still get reported as expected as well. Let me know if you have any concerns with it at this point though!


================
Comment at: lld/test/MachO/framework-object.ll:1
+; REQUIRES: x86
+; RUN: rm -rf %t; split-file %s %t
----------------
int3 wrote:
> not sure this test deserves its own file... seems like we could fold it into framework.s and lc-linker-option.ll
Happy to, what's the line for a new file or not? I guess I was thinking separate was nicer just because files with a ton of unrelated RUNs gets a bit harder to navigate IMO. But also not having a clear name for this one did feel a bit weird


================
Comment at: lld/test/MachO/framework-object.ll:5
+; RUN: mkdir -p %t/Foo.framework
+; RUN: llc --filetype=obj %t/framework.ll -o %t/Foo.framework/Foo
+; RUN: llc --filetype=obj %t/load-framework-foo.ll -o %t/main
----------------
oontvoo wrote:
> llvm-mc
> 
Are you saying I should rewrite this to use asm for this case instead? Or is there some version of that command that can do this same conversion? AFAICT if the source is either, I have to use multiple tools to get an object one time, and bitcode the other time.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114841/new/

https://reviews.llvm.org/D114841



More information about the llvm-commits mailing list