[llvm-dev] LLD: time to enable --threads by default

Thu Nov 17 10:00:43 PST 2016

On Thu, Nov 17, 2016 at 9:50 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:

>
> On Nov 17, 2016, at 9:41 AM, Rui Ueyama via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> On Thu, Nov 17, 2016 at 6:12 AM, Teresa Johnson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Thu, Nov 17, 2016 at 4:11 AM, Rafael Espíndola via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> > Sounds like threading isn't beneficial much beyond the second CPU...
>>> > Maybe blindly creating one thread per core isn't the best plan...
>>>
>>> parallel.h is pretty simplistic at the moment. Currently it creates
>>> one per SMT. One per core and being lazy about it would probably be a
>>> good thing, but threading is already beneficial and improving
>>> parallel.h an welcome improvement.
>>>
>>
>> Instead of using std::thread::hardware_concurrency (which is one per
>> SMT), you may be interested in using the facility I added for setting
>> default ThinLTO backend parallelism so that one per physical core is
>> created, llvm::heavyweight_hardware_concurrency() (see D25585  and
>> r284390). The name is meant to indicate that this is the concurrency that
>> should be used for heavier weight tasks (that may use a lot of memory e.g.).
>>
>
> Sorry for my ignorance, but what's the point of running the same number of
> threads as the number of physical cores instead of HT virtual cores? If we
> can get better throughput by not running more than one thread per a
> physical core, it feels like HT is a useless technology.
>
>
> It depends on the use-case: with ThinLTO we scale linearly with the number
> of physical cores. When you get over the number of physical cores you still
> get some improvements, but that’s no longer linear.
> The profitability question is a tradeoff one: for example if each of your
> task is very memory intensive, you may not want to overcommit the cores or
> increase the ratio of available mem per physical core.
>
> To take some number as an example: if your average user has a 8GB machine
> with 4 cores (8 virtual cores with HT), and you know that each of your
> parallel tasks is consuming 1.5GB of memory on average, then having 4
> parallel workers threads to process your tasks will lead to a peak memory
> of 6GB, having 8 parallel threads will lead to a peak mem of 12GB and the
> machine will start to swap.
>
> Another consideration is that having the linker issuing threads behind the
> back of the build system isn’t great: the build system is supposed to
> exploit the parallelism. Now if it spawn 10 linker jobs in parallel, how
> many threads are competing for the hardware?
>
> So, HT is not useless, but it is not universally applicable or universally
> efficient in the same way.
>
> Hope it makes sense!
>

Thank you for the explanation! That makes sense.

Unlike ThinLTO, each thread in LLD consumes very small amount of memory
(probably just a few megabytes), so that's not a problem for me. At the
final stage of linking, we spawn threads to copy section contents and apply
relocations, and I guess that causes a lot of memory traffic because that's
basically memcpy'ing input files to an output file, so the memory bandwidth
could be a limiting factor there. But I do not see a reason to limit the
number of threads to the number of physical core. For LLD, it seems like we
can just spawn as many threads as HT provides.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161117/ac5781db/attachment.html>