[llvm-dev] LLD: time to enable --threads by default
Mehdi Amini via llvm-dev
llvm-dev at lists.llvm.org
Thu Nov 17 09:50:41 PST 2016
> On Nov 17, 2016, at 9:41 AM, Rui Ueyama via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> On Thu, Nov 17, 2016 at 6:12 AM, Teresa Johnson via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> On Thu, Nov 17, 2016 at 4:11 AM, Rafael Espíndola via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> > Sounds like threading isn't beneficial much beyond the second CPU...
> > Maybe blindly creating one thread per core isn't the best plan...
> parallel.h is pretty simplistic at the moment. Currently it creates
> one per SMT. One per core and being lazy about it would probably be a
> good thing, but threading is already beneficial and improving
> parallel.h an welcome improvement.
> Instead of using std::thread::hardware_concurrency (which is one per SMT), you may be interested in using the facility I added for setting default ThinLTO backend parallelism so that one per physical core is created, llvm::heavyweight_hardware_concurrency() (see D25585 and r284390). The name is meant to indicate that this is the concurrency that should be used for heavier weight tasks (that may use a lot of memory e.g.).
> Sorry for my ignorance, but what's the point of running the same number of threads as the number of physical cores instead of HT virtual cores? If we can get better throughput by not running more than one thread per a physical core, it feels like HT is a useless technology.
It depends on the use-case: with ThinLTO we scale linearly with the number of physical cores. When you get over the number of physical cores you still get some improvements, but that’s no longer linear.
The profitability question is a tradeoff one: for example if each of your task is very memory intensive, you may not want to overcommit the cores or increase the ratio of available mem per physical core.
To take some number as an example: if your average user has a 8GB machine with 4 cores (8 virtual cores with HT), and you know that each of your parallel tasks is consuming 1.5GB of memory on average, then having 4 parallel workers threads to process your tasks will lead to a peak memory of 6GB, having 8 parallel threads will lead to a peak mem of 12GB and the machine will start to swap.
Another consideration is that having the linker issuing threads behind the back of the build system isn’t great: the build system is supposed to exploit the parallelism. Now if it spawn 10 linker jobs in parallel, how many threads are competing for the hardware?
So, HT is not useless, but it is not universally applicable or universally efficient in the same way.
Hope it makes sense!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev