[lld] [lld][MachO]Multi-threaded i/o. Twice as fast linking a large project. (PR #147134)

Thu Aug 7 15:02:52 PDT 2025

================
@@ -282,11 +284,83 @@ static void saveThinArchiveToRepro(ArchiveFile const *file) {
           ": Archive::children failed: " + toString(std::move(e)));
 }
 
-static InputFile *addFile(StringRef path, LoadType loadType,
-                          bool isLazy = false, bool isExplicit = true,
-                          bool isBundleLoader = false,
-                          bool isForceHidden = false) {
-  std::optional<MemoryBufferRef> buffer = readFile(path);
+class DeferredFile {
+public:
+  DeferredFile(StringRef path, bool isLazy, MemoryBufferRef buffer)
+      : path(path), isLazy(isLazy), buffer(buffer) {}
+  StringRef path;
+  bool isLazy;
+  MemoryBufferRef buffer;
+};
+using DeferredFiles = std::vector<DeferredFile>;
+
+// Most input files have been mapped but not yet paged in.
+// This code forces the page-ins on multiple threads so
+// the process is not stalled waiting on disk buffer i/o.
+void multiThreadedPageInBackground(const DeferredFiles &deferred) {
+  static const size_t pageSize = Process::getPageSizeEstimate();
+  static size_t totalBytes = 0;
+  std::atomic_int index = 0;
+
+  parallelFor(0, config->readThreads, [&](size_t I) {
----------------
drodriguez wrote:

If you want to use this approach to `parallelFor` because you really think it is more efficient in every case, I think the documentation should be modified. When one is passing `--read-threads=X`, I would expect X threads, but the current implementation can do less threads (because `-threads` is lower). If you want to keep this approach, the argument name should not lie about what it represent (call it "workers", call it "parallelism", call it whatever except "threads").

```
auto preloadDeferredFile = [&](const DeferredFile &deferredFile) {
  const StringRef &buff = deferred[localIndex].buffer.getBuffer();
  if (buff.size() > largeArchive)
     continue;
#ifndef NDEBUG
  totalBytes += buff.size();
  numDeferedFilesTouched += 1;
#endif

  // Reference all file's mmap'd pages to load them into memory.
  for (const char *page = buff.data(), *end = page + buff.size();
      page < end; page += pageSize)
    LLVM_ATTRIBUTE_UNUSED volatile char t = *page;
  }
};
#if LLVM_ENABLE_THREADS
{ // Create scope for waiting for the taskGroup
  std::atomic_size_t index = 0;
  llvm::parallel::TaskGroup taskGroup;
  for (int w = 0; w < config->readWorkers; w++)
    taskGroup.spawn([&index, &preloadDeferredFile, &deferred]() {
      while (true) {
        size_t localIndex = index.fetch_add(1);
        if (localIndex >= deferred.size())
          break;
        preloadDeferredFile(deferred[localIndex]);
      }
    });
}
#else
// not sure if you want to preload in this case
// for (const DeferredFile &deferredFile: deferredFile) {
//   preloadDeferredFile(deferredFile);
// }
#endif
```

It is more complicated code, mostly because one has to deal with the possibility that LLVM is being compiled without threads support, but by not using `parallelFor` we are indicating whoever reads this code that this is not a simple `parallelFor` in which each item is processed individually. We are not abusing `parallelFor` to build our own logic on top.

But, this is my opinion, if any other reviewer says "I am fine with this usage of `parallelFor`" feel free to ignore me.

https://github.com/llvm/llvm-project/pull/147134