[lld] da68d21 - [ELF] Cap parallel::strategy to 16 threads when --threads= is unspecified

Wed Apr 12 13:13:44 PDT 2023

Author: Fangrui Song
Date: 2023-04-12T13:13:38-07:00
New Revision: da68d2164efcc1f5e57f090e2ae2219056b120a0

URL: https://github.com/llvm/llvm-project/commit/da68d2164efcc1f5e57f090e2ae2219056b120a0
DIFF: https://github.com/llvm/llvm-project/commit/da68d2164efcc1f5e57f090e2ae2219056b120a0.diff

LOG: [ELF] Cap parallel::strategy to 16 threads when --threads= is unspecified

When --threads= is unspecified, we set it to
`parallel::strategy.compute_thread_count()`, which uses
sched_getaffinity (Linux)/cpuset_getaffinity (FreeBSD)/std::thread::hardware_concurrency (others).
With extensive testing on many machines (many configurations from
{aarch64,x86-64} x {Linux,FreeBSD,Windows} x allocators(native,mimalloc,rpmalloc) combinations)
with varying workloads, we discovered that when the concurrency is larger than
16, the linking process is slower than using --threads=16 due to parallelism
overhead outweighs optimizations. This is particularly harmful for machines with
many cores or when the link job competes with other jobs.

Cap parallel::strategy when --threads= is unspecified.
For some workloads changing the concurrency from 8 to 16 has nearly no improvement.

--thinlto-jobs= is unchanged since ThinLTO backend compiles are embarrassingly
parallel.

Link: https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D147493

Added: 
    

Modified: 
    lld/ELF/Driver.cpp

Removed: 
    


################################################################################
diff  --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index 8cc2d2005e390..5099f1cbe96a6 100644

--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -1416,8 +1416,12 @@ static void readConfigs(opt::InputArgList &args) {
     config->mllvmOpts.emplace_back(arg->getValue());
   }
 
+  config->threadCount = parallel::strategy.compute_thread_count();
+
   // --threads= takes a positive integer and provides the default value for
-  // --thinlto-jobs=.
+  // --thinlto-jobs=. If unspecified, cap the number of threads since
+  // overhead outweighs optimization for used parallel algorithms for the
+  // non-LTO parts.
   if (auto *arg = args.getLastArg(OPT_threads)) {
     StringRef v(arg->getValue());
     unsigned threads = 0;
@@ -1426,10 +1430,12 @@ static void readConfigs(opt::InputArgList &args) {
             arg->getValue() + "'");
     parallel::strategy = hardware_concurrency(threads);
     config->thinLTOJobs = v;
+  } else if (config->threadCount > 16) {
+    log("set maximum concurrency to 16, specify --threads= to change");
+    parallel::strategy = hardware_concurrency(16);
   }
   if (auto *arg = args.getLastArg(OPT_thinlto_jobs_eq))
     config->thinLTOJobs = arg->getValue();
-  config->threadCount = parallel::strategy.compute_thread_count();
 
   if (config->ltoPartitions == 0)
     error("--lto-partitions: number of threads must be > 0");