[lld] [lld][MachO]Multi-threaded i/o. Twice as fast linking a large project. (PR #147134)

Thu Jul 17 12:39:33 PDT 2025

================
@@ -282,11 +284,85 @@ static void saveThinArchiveToRepro(ArchiveFile const *file) {
           ": Archive::children failed: " + toString(std::move(e)));
 }
 
-static InputFile *addFile(StringRef path, LoadType loadType,
-                          bool isLazy = false, bool isExplicit = true,
-                          bool isBundleLoader = false,
-                          bool isForceHidden = false) {
-  std::optional<MemoryBufferRef> buffer = readFile(path);
+class DeferredFile {
+public:
+  StringRef path;
+  bool isLazy;
+  MemoryBufferRef buffer;
+};
+using DeferredFiles = std::vector<DeferredFile>;
+
+// Most input files have been mapped but not yet paged in.
+// This code forces the page-ins on multiple threads so
+// the process is not stalled waiting on disk buffer i/o.
+void multiThreadedPageInBackground(const DeferredFiles &deferred) {
+  static size_t pageSize = Process::getPageSizeEstimate(), totalBytes;
+  static std::mutex mutex;
+  size_t index = 0;
+
+  parallelFor(0, config->readThreads, [&](size_t I) {
+    while (true) {
+      mutex.lock();
+      if (index >= deferred.size()) {
+        mutex.unlock();
+        return;
+      }
+      const StringRef &buff = deferred[index].buffer.getBuffer();
+      totalBytes += buff.size();
+      index += 1;
+      mutex.unlock();
+
+      // Reference each page to load it into memory.
+      for (const char *page = buff.data(), *end = page + buff.size();
+           page < end; page += pageSize)
+        volatile char t = *page;
+    }
+  });
+
+  if (getenv("LLD_MULTI_THREAD_PAGE"))
+    llvm::dbgs() << "multiThreadedPageIn " << totalBytes << "/"
+                 << deferred.size() << "\n";
+}
+
+static void multiThreadedPageIn(const DeferredFiles &deferred) {
+  static std::deque<DeferredFiles *> queue;
+  static std::thread *running;
+  static std::mutex mutex;
+
+  mutex.lock();
+  if (running && (queue.empty() || deferred.empty())) {
+    running->join();
+    delete running;
+    running = nullptr;
+  }
+
+  if (!deferred.empty()) {
+    queue.emplace_back(new DeferredFiles(deferred));
+    if (!running)
+      running = new std::thread([&]() {
+        while (true) {
+          mutex.lock();
+          if (queue.empty()) {
+            mutex.unlock();
+            return;
+          }
+          DeferredFiles *deferred = queue.front();
+          queue.pop_front();
+          mutex.unlock();
+          multiThreadedPageInBackground(*deferred);
+          delete deferred;
+        }
+      });
+  }
+  mutex.unlock();
+}
----------------
drodriguez wrote:

I think introducing more locking only makes the code more complicated to reason about. I think the extra lock might avoid the problem I was pointing, but I cannot guarantee that it will not introduce anything else (and I cannot find a example of a problem either, which doesn't mean doesn't exist). In my experience more complex code, specially when multi-threading, is normally more prone to problems.

One new thing I noticed is the `join` of the thread when the `queue` is empty. If the `queue` is empty at this point, the inner thread has either finished (or about to finish), or it is going to check `queue.empty()` to finish.  You can avoid this `join` in both of those cases.

Getting this concurrency correctly done is complicated, and that's why I would prefer you to use higher order constructs provided by LLVM and that are proven to work, even if they are slightly slower than your current results. Even if they are slower, they are an improvement from the current situation. Once we have something that is know to be safe from deadlocks, one can iterate further in performance. That's my 2 cents.

https://github.com/llvm/llvm-project/pull/147134