[lld] a8788de - [ELF] Cap parallel::strategy to 16 threads when --threads= is unspecified
Fangrui Song via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 20 12:17:31 PDT 2023
Author: Fangrui Song
Date: 2023-04-20T12:17:26-07:00
New Revision: a8788de1c3f3c8c3a591bd3aae2acee1b43b229a
URL: https://github.com/llvm/llvm-project/commit/a8788de1c3f3c8c3a591bd3aae2acee1b43b229a
DIFF: https://github.com/llvm/llvm-project/commit/a8788de1c3f3c8c3a591bd3aae2acee1b43b229a.diff
LOG: [ELF] Cap parallel::strategy to 16 threads when --threads= is unspecified
When --threads= is unspecified, we set it to
`parallel::strategy.compute_thread_count()`, which uses
sched_getaffinity (Linux)/cpuset_getaffinity (FreeBSD)/std::thread::hardware_concurrency (others).
With extensive testing on many machines (many configurations from
{aarch64,x86-64} x {Linux,FreeBSD,Windows} x allocators(native,mimalloc,rpmalloc) combinations)
with varying workloads, we discovered that when the concurrency is larger than
16, the linking process is slower than using --threads=16 due to parallelism
overhead outweighs optimizations. This is particularly harmful for machines with
many cores or when the link job competes with other jobs.
Cap parallel::strategy when --threads= is unspecified.
For some workloads changing the concurrency from 8 to 16 has nearly no improvement.
--thinlto-jobs= is unchanged since ThinLTO backend compiles are embarrassingly
parallel.
Link: https://discourse.llvm.org/t/avoidable-overhead-from-threading-by-default/69160
Reviewed By: peter.smith, andrewng
Differential Revision: https://reviews.llvm.org/D147493
Added:
Modified:
lld/ELF/Driver.cpp
Removed:
################################################################################
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index c540f573aaef9..79f16a281df9b 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -1421,7 +1421,9 @@ static void readConfigs(opt::InputArgList &args) {
}
// --threads= takes a positive integer and provides the default value for
- // --thinlto-jobs=.
+ // --thinlto-jobs=. If unspecified, cap the number of threads since
+ // overhead outweighs optimization for used parallel algorithms for the
+ // non-LTO parts.
if (auto *arg = args.getLastArg(OPT_threads)) {
StringRef v(arg->getValue());
unsigned threads = 0;
@@ -1430,6 +1432,9 @@ static void readConfigs(opt::InputArgList &args) {
arg->getValue() + "'");
parallel::strategy = hardware_concurrency(threads);
config->thinLTOJobs = v;
+ } else if (parallel::strategy.compute_thread_count() > 16) {
+ log("set maximum concurrency to 16, specify --threads= to change");
+ parallel::strategy = hardware_concurrency(16);
}
if (auto *arg = args.getLastArg(OPT_thinlto_jobs_eq))
config->thinLTOJobs = arg->getValue();
More information about the llvm-commits
mailing list