[PATCH] D140592: [lld-macho] Skip re-loading archive if already loaded

Wed Dec 28 12:04:45 PST 2022

thevinster added a comment.

In D140592#4018461 <https://reviews.llvm.org/D140592#4018461>, @thakis wrote:

> Imho it'd be nicer to add this to the "list of places where lld is different from ld64". From what I can tell, this isn't a _huge_ problem in practice (we haven't needed it until now), and there's some value in both sensible semantics and in having behavior that's consistent with the other llds too.

I hear you on the sensible semantics, but I disagree that we need to have behavior that's consistent with the other lld ports. From what I can see, there are many things that the Mach-O port has introduced that differs from ELF and COFF ports (e.g. string alignment, async map writing, etc.) that at this point, I would even consider all three linkers to be distinct. Plus, it deviates from the goal of lld being a "drop-in" replacement. While I don't have strong feelings of whether this specific behavior needs to be implemented in lld, I'm curious to know where you draw the line when a behavior should deviate from ld64. In regards to whether this is a huge problem in practice, I don't think any of us here can make great arguments since everyone focuses on making their builds work. In your case, maybe this isn't a "huge problem", but in our case, it is.

================
Comment at: lld/MachO/InputFiles.cpp:2110

+  if (!skipCache && !seen.insert(mb->getBuffer()).second)
+    return Error::success();
----------------
int3 wrote:
> thevinster wrote:
> > thevinster wrote:
> > > thakis wrote:
> > > > We're doing this based on buffer contents? Where in ld64's source does it do that? That seems like very surprising semantics (and kinda bad for perf).
> > > I don't know if there will be any significant perf regressions. I'll double check that on my end first before saying anything else. From what I can tell, it looks like ld64 hashes based on the contents of the [[ https://github.com/apple-oss-distributions/ld64/blob/main/src/ld/parsers/archive_file.cpp#L99-L111 | Entry ]] class. I agree that these are surprising semantics (especially the behavior of force loading two of the same object files from an archive), but this strange behavior is also implicitly depended upon by some of our builds. Whether or not this is the correct implementation, I'm merely trying to follow ld64's behavior. As far as implementation is concerned, this is the path of least resistance that'll solve our current duplicate symbol issue. I'm open to suggestions on how else this can be done (I've tried approaches where I only load the symbol if it hasn't been loaded but that ran into many edge cases). 
> > > 
> > > 
> > Benchmarking chromium framework, 
> > ```
> >            base           diff           difference (95% CI)
> > sys_time   2.050 ± 0.028  2.037 ± 0.027  [  -1.2% ..   -0.0%]
> > user_time  4.550 ± 0.040  4.589 ± 0.040  [  +0.5% ..   +1.2%]
> > wall_time  7.054 ± 0.067  7.084 ± 0.067  [  +0.0% ..   +0.8%]
> > samples    45             42
> > ```
> > 
> > This is a very slight regression, and personally, I think it's worth the hit if it means us being even more compatible with ld64's behaviors.
> Do you have a code pointer to where the `Entry` class is getting hashed in ld64?
If I'm understanding correctly, I believe [[ https://github.com/apple-oss-distributions/ld64/blob/main/src/ld/parsers/archive_file.cpp#L357 | this method ]]  is where the check happens whether an archive member has already been loaded. In particular, `_instantiatedEntries` (https://github.com/apple-oss-distributions/ld64/blob/main/src/ld/parsers/archive_file.cpp#L121) is a map of an Entry to the MemberState. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140592/new/

https://reviews.llvm.org/D140592