<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Nov 16, 2016 at 12:44 PM, Rui Ueyama via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>LLD supports multi-threading, and it seems to be working well as you can see in a recent <a href="http://llvm.org/viewvc/llvm-project?view=revision&revision=287140" target="_blank">result</a>. In short, LLD runs 30% faster with --threads option and more than 50% faster if you are using --build-id (your mileage may vary depending on your computer). However, I don't think most users even don't know about that because --threads is not a default option.</div><div><br></div><div>I'm thinking to enable --threads by default. We now have real users, and they'll be happy about the performance boost.</div><div><br></div><div>Any concerns?</div><div><br></div><div>I can't think of problems with that, but I want to write a few notes about that:</div><div><br></div><div> - We still need to focus on single-thread performance rather than multi-threaded one because it is hard to make a slow program faster just by using more threads.</div><div><br></div><div> - We shouldn't do "too clever" things with threads. Currently, we are using multi-threads only at two places where they are highly parallelizable by nature (namely, copying and applying relocations for each input section, and computing build-id hash). We are using parallel_for_each, and that is very simple and easy to understand. I believe this was a right design choice, and I don't think we want to have something like workqueues/tasks in GNU gold, for example.</div></div></blockquote><div><br></div><div>Sorry for the late response.</div><div><br></div><div>Copying and applying relocations is actually are not as parallelizable as you would imagine in current LLD. The reason is that there is an implicit serialization when mutating the kernel's VA map (which happens any time there is a minor page fault, i.e. the first time you touch a page of an mmap'd input). Since threads share the same VA, there is an implicit serialization across them. Separate processes are needed to avoid this overhead (note that the separate processes would still have the same output file mapped; so (at least with fixed partitioning) there is no need for complex IPC).</div><div><br></div><div>For `ld.lld -O0` on Mac host, I measured <1GB/s copying speed, even though the machine I was running on had like 50 GB/s DRAM bandwidth; so the VA overhead is on the order of a 50x slowdown for this copying operation in this extreme case, so Amdahl's law indicates that there will be practically no speedup for this copy operation by adding multiple threads. I've also DTrace'd this to see massive contention on the VA lock. LInux will be better but no matter how good, it is still a serialization point and Amdahl's law will limit your speedup significantly.</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><br></div><div> - Run benchmarks with --no-threads if you are not focusing on multi-thread performance.</div><div><br></div></div>
<br>______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br></blockquote></div><br></div></div>