[PATCH] D140592: [lld-macho] Skip re-loading archive if already loaded

Jez Ng via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 28 18:21:03 PST 2022


int3 added a comment.

So I'm kinda with @thakis on this. I would prefer we avoid hashing contents (assuming this is what ld64 is actually doing...)

> I'm curious to know where you draw the line when a behavior should deviate from ld64

@thakis would probably say that he doesn't like the async map file stuff either ;)

IMO the async map file stuff is different because

1. it's a strict implementation detail -- there's no observable change aside from perf
2. though the perf impact of hashing archives in the chromium_framework build is small, this seems like the kind of thing that could surprise us down the line. hashing entire files is in general Not Cheap, and it seems quite possible that we'll one day encounter a large archive member that causes perf issues. We *could* try to hack around it by introducing a flag such as `--no-hash-archive-member=`, but that's super ugly

Or in other words, the async map file has an ugly implementation, but the final behavior is easy enough to reason about / build on top of. Hashing archive contents has a relatively simple implementation, but the resulting behavior feels off...



================
Comment at: lld/MachO/InputFiles.cpp:2110
 
+  if (!skipCache && !seen.insert(mb->getBuffer()).second)
+    return Error::success();
----------------
thevinster wrote:
> int3 wrote:
> > thevinster wrote:
> > > thevinster wrote:
> > > > thakis wrote:
> > > > > We're doing this based on buffer contents? Where in ld64's source does it do that? That seems like very surprising semantics (and kinda bad for perf).
> > > > I don't know if there will be any significant perf regressions. I'll double check that on my end first before saying anything else. From what I can tell, it looks like ld64 hashes based on the contents of the [[ https://github.com/apple-oss-distributions/ld64/blob/main/src/ld/parsers/archive_file.cpp#L99-L111 | Entry ]] class. I agree that these are surprising semantics (especially the behavior of force loading two of the same object files from an archive), but this strange behavior is also implicitly depended upon by some of our builds. Whether or not this is the correct implementation, I'm merely trying to follow ld64's behavior. As far as implementation is concerned, this is the path of least resistance that'll solve our current duplicate symbol issue. I'm open to suggestions on how else this can be done (I've tried approaches where I only load the symbol if it hasn't been loaded but that ran into many edge cases). 
> > > > 
> > > > 
> > > Benchmarking chromium framework, 
> > > ```
> > >            base           diff           difference (95% CI)
> > > sys_time   2.050 ± 0.028  2.037 ± 0.027  [  -1.2% ..   -0.0%]
> > > user_time  4.550 ± 0.040  4.589 ± 0.040  [  +0.5% ..   +1.2%]
> > > wall_time  7.054 ± 0.067  7.084 ± 0.067  [  +0.0% ..   +0.8%]
> > > samples    45             42
> > > ```
> > > 
> > > This is a very slight regression, and personally, I think it's worth the hit if it means us being even more compatible with ld64's behaviors.
> > Do you have a code pointer to where the `Entry` class is getting hashed in ld64?
> If I'm understanding correctly, I believe [[ https://github.com/apple-oss-distributions/ld64/blob/main/src/ld/parsers/archive_file.cpp#L357 | this method ]]  is where the check happens whether an archive member has already been loaded. In particular, `_instantiatedEntries` (https://github.com/apple-oss-distributions/ld64/blob/main/src/ld/parsers/archive_file.cpp#L121) is a map of an Entry to the MemberState. 
`_instantiatedEntries` is a vanilla `std::map` w/o a custom comparator (https://github.com/apple-oss-distributions/ld64/blob/main/src/ld/parsers/archive_file.cpp#L121), so I don't think it's hashing the contents of `Entry`, just the pointer value itself


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140592/new/

https://reviews.llvm.org/D140592



More information about the llvm-commits mailing list