[lld] [lld][MachO]Multi-threaded i/o. Twice as fast linking a large project. (PR #147134)

Wed Sep 10 09:09:50 PDT 2025

================
@@ -282,11 +284,117 @@ static void saveThinArchiveToRepro(ArchiveFile const *file) {
           ": Archive::children failed: " + toString(std::move(e)));
 }
 
-static InputFile *addFile(StringRef path, LoadType loadType,
-                          bool isLazy = false, bool isExplicit = true,
-                          bool isBundleLoader = false,
-                          bool isForceHidden = false) {
-  std::optional<MemoryBufferRef> buffer = readFile(path);
+struct DeferredFile {
+  StringRef path;
+  bool isLazy;
+  MemoryBufferRef buffer;
+};
+using DeferredFiles = std::vector<DeferredFile>;
+
+class SerialBackgroundQueue {
+  std::deque<std::function<void()>> queue;
+  std::thread *running;
+  std::mutex mutex;
+
+public:
+  void queueWork(std::function<void()> work) {
+    mutex.lock();
+    if (running && queue.empty()) {
+      mutex.unlock();
+      running->join();
+      mutex.lock();
+      delete running;
+      running = nullptr;
+    }
+
+    if (work) {
+      queue.emplace_back(std::move(work));
+      if (!running)
+        running = new std::thread([&]() {
+          while (true) {
+            mutex.lock();
+            if (queue.empty()) {
+              mutex.unlock();
+              break;
+            }
+            auto work = std::move(queue.front());
+            mutex.unlock();
+            work();
+            mutex.lock();
+            queue.pop_front();
+            mutex.unlock();
+          }
+        });
+    }
+    mutex.unlock();
+  }
+};
+
+// Most input files have been mapped but not yet paged in.
+// This code forces the page-ins on multiple threads so
+// the process is not stalled waiting on disk buffer i/o.
+void multiThreadedPageInBackground(DeferredFiles &deferred) {
----------------
johnno1962 wrote:

Looking only at macOS, I've switched the code in multiThreadedPageInBackground() to use madvise() and the code below is as fast as this PR and much more conventional. I/o seems concentrated as it should be according to activity monitor so I'd recommend a followup PR switching to madvise(). Let me know if you'd like me to pick that up.

It's faster if the madvise() is performed in the background as it takes seconds to complete for the 8000+8000 memory regions in the Chrome link and also seems slightly faster if you parallelise with multiple "workers". It also seems mapped files are more likely to remain in memory longer for a second "warm" link.
```

diff --git a/lld/MachO/Driver.cpp b/lld/MachO/Driver.cpp
index 87d47d47d562..63fa48707b52 100644
--- a/lld/MachO/Driver.cpp
+++ b/lld/MachO/Driver.cpp
@@ -52,6 +52,7 @@
 #include "llvm/TargetParser/Host.h"
 #include "llvm/TextAPI/Architecture.h"
 #include "llvm/TextAPI/PackedVersion.h"
+#include <sys/mman.h>
 
 using namespace llvm;
 using namespace llvm::MachO;
@@ -336,6 +337,7 @@ public:
 void multiThreadedPageInBackground(DeferredFiles &deferred) {
   static const size_t pageSize = Process::getPageSizeEstimate();
   static const size_t largeArchive = 10 * 1024 * 1024;
+#undef NDEBUG
 #ifndef NDEBUG
   using namespace std::chrono;
   std::atomic_int numDeferedFilesTouched = 0;
@@ -352,6 +354,9 @@ void multiThreadedPageInBackground(DeferredFiles &deferred) {
     numDeferedFilesTouched += 1;
 #endif
 
+    madvise((void *)buff.data(), buff.size(), MADV_WILLNEED);
+    return;
+
     // Reference all file's mmap'd pages to load them into memory.
     for (const char *page = buff.data(), *end = page + buff.size(); page < end;
          page += pageSize)


https://github.com/llvm/llvm-project/pull/147134