[lld] [lld][MachO]Multi-threaded i/o. Twice as fast linking a large project. (PR #147134)

Wed Jul 9 22:38:20 PDT 2025

johnno1962 wrote:

I wonder if I could push back ever so gently on the mindset "we just need to parallelise all the things". For my example task linking the main chrome dll, if you perform it twice the second time only only takes about 7 seconds on my machine which is a measure of the amount of actual on CPU processing in the task absent disk I/o. Without the optimisation I suggest we're along way from that with a baseline of about 25 seconds for the first link and the difference seems to be very bad I/o strategy. At present, the linker initially maps all files then starts to process them. Input is read in by a page fault when code encounters a particular part of a file in memory. The problem is that this introduces a delay while the host accesses disk during which the cpu is not doing anything before the page is returned and the cpu moves on to cause the next page fault etc. resulting in these long baseline elapsed times, very low disk and cpu utilisation which you can see in activity monitor.

The code is well structured and it was easy modifying it to map all files then start a thread that scans aggressively through the memory of all files referring every page. The difference being that the new approach has multiple threads that do this so, while one thread is waiting for the disk access other threads are causing the next page fault and it can all be performed in parallel. Using this strategy disk and cpu utilisation are much higher as the process is not spending the bulk of its time stalled waiting for page faults. The background input file scan is independent of processing and acts as a hint to the o/s the program would like input files read in as quickly as possible.  This is the basis of this suggested PR.

At present link times for my chrome task are down to about 12-13 seconds which represents a 50/50 split between i/o of 13GB of input files at 2GB per second (6.5 seconds) and the ~7 seconds actual CPU work I mentioned before. The alternative of parallelising processing does improve this situation as the different threads interleave page faults as I described above but it is a little like treating the symptoms without identifying what is the root cause.

https://github.com/llvm/llvm-project/pull/147134