[PATCH] D25452: [LTO] Split the options for ThinLTO jobs and Regular LTO partitions

Mon Oct 10 20:22:06 PDT 2016

On Mon, Oct 10, 2016 at 5:01 PM, Mehdi AMINI via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> mehdi_amini added inline comments.
>
>
> ================
> Comment at: ELF/Driver.cpp:487
> +    error("--lto-partitions: number of threads must be > 0");
> +  Config->ThinLtoJobs = getInteger(Args, OPT_thinlto_jobs, 1);
> +  if (Config->ThinLtoJobs == 0)
> ----------------
> pcc wrote:
> > mehdi_amini wrote:
> > > pcc wrote:
> > > > I think this would result in a default parallelism level of 1. Can
> we avoid passing a ThinBackend if the user did not specify a parallelism
> level?
> > > Doesn't it mean that if the user does not specify `--thinlto-jobs` on
> the command line, ThinLTO won't work? I think it would even lead to the LTO
> library crashing if the bitcode has been generated with ThinLTO? Is there a
> test that shows what happen in this case? (I don't see any, but I didn't
> look closely)
> > > The default should be something like `std::hardware_concurrency`.
> > No, if the ThinBackend is not supplied, LTO will create one for you.
> See: http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#229
> >
> > I think we will eventually want to replace the
> `thread::hardware_concurrency()` default with something else that will
> lead to better utilisation of the hardware. In Chromium at least we've
> observed that `hardware_concurrency` leads to poor utilisation on machines
> with a large number of cores (https://bugs.chromium.org/p/c
> hromium/issues/detail?id=645295#c20). So I'd prefer to keep any default
> out of the clients.
> >
> > You're right, we should have a test that does not pass a
> `--thinlto-jobs` flag. @davide can you please add one?
> > No, if the ThinBackend is not supplied, LTO will create one for you.
> See: http://llvm-cs.pcc.me.uk/lib/LTO/LTO.cpp#229
>
> OK! Great :)
>
> > I think we will eventually want to replace the
> thread::hardware_concurrency() default with something else that will lead
> to better utilisation of the hardware. In Chromium at least we've observed
> that hardware_concurrency leads to poor utilisation on machines with a
> large number of cores (https://bugs.chromium.org/p/c
> hromium/issues/detail?id=645295#c20). So I'd prefer to keep any default
> out of the clients.
>
> Yes, Apple's clang is using the number of physical cores because of that,
> but my rational was to limit the memory consumption at a time where debug
> info where even more costly than now for ThinLTO.
> So indeed we plan to add support for this in LLVM.
>

I've looked at this a bit a few weeks ago and had a lot of trouble finding
a reliable way to compute the number of physical cores on X86, at least
with Linux. There is a way to compute this from "lscpu -p", but I haven't
found a good way to compute this from source code that isn't very
architecture/implementation dependent. Mehdi indicated that MacOS has a
method for accessing this info that he is using for Xcode.

>
> On the other hand, David and Teresa benchmarks building Chromium on a
> 16-cores Xeon machine (32 hyper threaded cores) and they reported 2m18s for
> the ThinLTO plugin with 32 threads vs  3m38s with 32 threads. So your link
> is puzzling...
>

>From follow-on discussion it looks like you were looking at the cc_unittest
numbers, and the numbers above we collected didn't include the thin link,
so they aren't too far off IIUC.

>
>
> https://reviews.llvm.org/D25452
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>

-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161010/9682ceab/attachment.html>