[PATCH] D25452: [LTO] Split the options for ThinLTO jobs and Regular LTO partitions

Mon Oct 10 17:01:00 PDT 2016

mehdi_amini added inline comments.

================
Comment at: ELF/Driver.cpp:487
+    error("--lto-partitions: number of threads must be > 0");
+  Config->ThinLtoJobs = getInteger(Args, OPT_thinlto_jobs, 1);
+  if (Config->ThinLtoJobs == 0)
----------------
pcc wrote:
> mehdi_amini wrote:
> > pcc wrote:
> > > I think this would result in a default parallelism level of 1. Can we avoid passing a ThinBackend if the user did not specify a parallelism level?
> > Doesn't it mean that if the user does not specify `--thinlto-jobs` on the command line, ThinLTO won't work? I think it would even lead to the LTO library crashing if the bitcode has been generated with ThinLTO? Is there a test that shows what happen in this case? (I don't see any, but I didn't look closely)
> > The default should be something like `std::hardware_concurrency`.
> No, if the ThinBackend is not supplied, LTO will create one for you. See: http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#229
> 
> I think we will eventually want to replace the `thread::hardware_concurrency()` default with something else that will lead to better utilisation of the hardware. In Chromium at least we've observed that `hardware_concurrency` leads to poor utilisation on machines with a large number of cores (https://bugs.chromium.org/p/chromium/issues/detail?id=645295#c20). So I'd prefer to keep any default out of the clients.
> 
> You're right, we should have a test that does not pass a `--thinlto-jobs` flag. @davide can you please add one?
> No, if the ThinBackend is not supplied, LTO will create one for you. See: http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#229

OK! Great :)

> I think we will eventually want to replace the thread::hardware_concurrency() default with something else that will lead to better utilisation of the hardware. In Chromium at least we've observed that hardware_concurrency leads to poor utilisation on machines with a large number of cores (https://bugs.chromium.org/p/chromium/issues/detail?id=645295#c20). So I'd prefer to keep any default out of the clients.

Yes, Apple's clang is using the number of physical cores because of that, but my rational was to limit the memory consumption at a time where debug info where even more costly than now for ThinLTO.
So indeed we plan to add support for this in LLVM.

On the other hand, David and Teresa benchmarks building Chromium on a 16-cores Xeon machine (32 hyper threaded cores) and they reported 2m18s for the ThinLTO plugin with 32 threads vs  3m38s with 32 threads. So your link is puzzling...

https://reviews.llvm.org/D25452