[lld] [lld][MachO]Multi-threaded i/o. Twice as fast linking a large project. (PR #147134)

Thu Aug 7 09:33:13 PDT 2025

================
@@ -282,11 +284,83 @@ static void saveThinArchiveToRepro(ArchiveFile const *file) {
           ": Archive::children failed: " + toString(std::move(e)));
 }
 
-static InputFile *addFile(StringRef path, LoadType loadType,
-                          bool isLazy = false, bool isExplicit = true,
-                          bool isBundleLoader = false,
-                          bool isForceHidden = false) {
-  std::optional<MemoryBufferRef> buffer = readFile(path);
+class DeferredFile {
+public:
+  DeferredFile(StringRef path, bool isLazy, MemoryBufferRef buffer)
+      : path(path), isLazy(isLazy), buffer(buffer) {}
+  StringRef path;
+  bool isLazy;
+  MemoryBufferRef buffer;
+};
+using DeferredFiles = std::vector<DeferredFile>;
+
+// Most input files have been mapped but not yet paged in.
+// This code forces the page-ins on multiple threads so
+// the process is not stalled waiting on disk buffer i/o.
+void multiThreadedPageInBackground(const DeferredFiles &deferred) {
+  static const size_t pageSize = Process::getPageSizeEstimate();
+  static size_t totalBytes = 0;
+  std::atomic_int index = 0;
+
+  parallelFor(0, config->readThreads, [&](size_t I) {
----------------
johnno1962 wrote:

@drodriguez, it seems to me we're orbiting on two related remaining disagreements: the use of parallelFor and the type of option (boolean vs. the approximate number of thread for proactive paging).

Let me unpack the first decision as I am convinced the code is following my understanding of the requirement and is not that difficult to understand. The requirement is to perform a large number of very simple operations efficiently. To use an analogy we have a supermarket where 8000 people want to buy a toothbrush. The solution is not to have 8000 tills (threads) nor is it to use 8000 different cashiers opening and closing the till for each request (running up a thread for each operation). The most efficient approach is to have a limited number of tills and have each customer take a numbered ticket as they arrive. These customers are called one at a time on the basis of the index like a post office. If supermarkets worked this way you wouldn't have the stress of deciding which till to join ahead of time nor would the process have to take any longer than absolutely necessary. This is my understanding of the use of parallelFor and the atomics. It is efficient because the overhead of a mutex or atomic metering the allocation of each ticket to a til/queue/thread is extremely low and is not critically dependant on the number of threads.

There may be some algorithm somewhere in llvm that does exactly this but I'd rather have control as when we tried to delegate thread management before it was 50% slower and not, I believe, any easier to understand.

Any chance we can reach agreement on this before we move onto the exact nature of the option? My position there is that I would prefer not to recycle an existing option that does who knows what. Being able to specify the number of threads independently for this feature is a feature and gives us a value for the maximum number of threads parallelFor will use. I'm about to be away from my computer for two weeks and I'd really appreciate it if we could get nearer to landing this PR in case it develops a conflict.

https://github.com/llvm/llvm-project/pull/147134