<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 17, 2015 at 12:30 PM, Rui Ueyama <span dir="ltr"><<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="">On Mon, Mar 16, 2015 at 8:29 PM, Shankar Easwaran <span dir="ltr"><<a href="mailto:shankare@codeaurora.org" target="_blank">shankare@codeaurora.org</a>></span> wrote:<br></span><span class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Author: shankare<br>
Date: Mon Mar 16 22:29:32 2015<br>
New Revision: 232460<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=232460&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=232460&view=rev</a><br>
Log:<br>
[ELF] Use parallel_for_each for writing.<br>
<br>
This changes improves performance of lld, when self-hosting lld, when compared<br>
with the bfd linker. BFD linker on average takes 8 seconds in elapsed time.<br>
lld takes 3 seconds elapased time average. Without this change, lld takes ~5<br>
seconds average. The runtime comparisons were done on a release build and<br>
measured by running linking thrice.<br>
<br>
lld self-host without the change<br>
----------------------------------<br>
real    0m3.196s<br>
user    0m4.580s<br>
sys     0m0.832s<br>
<br>
lld self-host with lld<br>
-----------------------<br>
user    0m3.024s<br>
user    0m3.252s<br>
sys     0m0.796s<br></blockquote><div><br></div></span><div>The above results don't look real output of "time" command.</div><div><br></div><div>If it's real, it's too good to be true, assuming the first line of the second result is "real" instead of "user".</div><div><br></div><div>"real" is wall clock time from process start to process exit. "user" is CPU time consumed by the process in user mode (if a process is multi-threaded, it can be larger than real).</div><div><br></div><div>Your result shows significant improvement in user time. Which means you have significantly reduced the amount of processing time to do the same thing compared to before. However, because this change didn't change algorithm, but just execute them in parallel, it couldn't happen.</div><div><br></div><div>Something's not correct.</div></div></div></div></blockquote><div><br></div><div>I also noticed a similar thing in Davide's flame graphs in `[LLVMdev] On LLD performance`. It's not impossible though. E.g. doing it multithreaded allows utilizing the caches of multiple cores, which could result in a significant reduction in the time the CPU is waiting on memory and thus improves user CPU time.</div><div><br></div><div>But I agree that it does make me skeptical/want a closer investigation.<br></div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>I appreciate your effort to make LLD faster, but we need to be careful about benchmark results. If we don't measure improvements accurately, it's easy to make an "optimization" that makes things slower.</div><div><br></div><div>Another important thing is to disbelieve what you do when you optimize something and measure its effect. It sometimes happen that I believe something is going to improve performance 100% sure but it actually wouldn't.</div><div><div class="h5"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
time taken to build lld with bfd<br>
--------------------------------<br>
real    0m8.419s<br>
user    0m7.748s<br>
sys     0m0.632s<br>
<br>
Modified:<br>
    lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h<br>
    lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h<br>
<br>
Modified: lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h?rev=232460&r1=232459&r2=232460&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h?rev=232460&r1=232459&r2=232460&view=diff</a><br>
==============================================================================<br>
--- lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h (original)<br>
+++ lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h Mon Mar 16 22:29:32 2015<br>
@@ -586,8 +586,10 @@ std::error_code OutputELFWriter<ELFT>::w<br>
   _elfHeader->write(this, _layout, *buffer);<br>
   _programHeader->write(this, _layout, *buffer);<br>
<br>
-  for (auto section : _layout.sections())<br>
-    section->write(this, _layout, *buffer);<br>
+  auto sections = _layout.sections();<br>
+  parallel_for_each(<br>
+      sections.begin(), sections.end(),<br>
+      [&](Chunk<ELFT> *section) { section->write(this, _layout, *buffer); });<br>
   writeTask.end();<br>
<br>
   ScopedTask commitTask(getDefaultDomain(), "ELF Writer commit to disk");<br>
<br>
Modified: lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h?rev=232460&r1=232459&r2=232460&view=diff" target="_blank">http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h?rev=232460&r1=232459&r2=232460&view=diff</a><br>
==============================================================================<br>
--- lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h (original)<br>
+++ lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h Mon Mar 16 22:29:32 2015<br>
@@ -234,17 +234,17 @@ public:<br>
   /// routine gets called after the linker fixes up the virtual address<br>
   /// of the section<br>
   virtual void assignVirtualAddress(uint64_t addr) override {<br>
-    for (auto &ai : _atoms) {<br>
+    parallel_for_each(_atoms.begin(), _atoms.end(), [&](AtomLayout *ai) {<br>
       ai->_virtualAddr = addr + ai->_fileOffset;<br>
-    }<br>
+    });<br>
   }<br>
<br>
   /// \brief Set the file offset of each Atom in the section. This routine<br>
   /// gets called after the linker fixes up the section offset<br>
   void assignFileOffsets(uint64_t offset) override {<br>
-    for (auto &ai : _atoms) {<br>
+    parallel_for_each(_atoms.begin(), _atoms.end(), [&](AtomLayout *ai) {<br>
       ai->_fileOffset = offset + ai->_fileOffset;<br>
-    }<br>
+    });<br>
   }<br>
<br>
   /// \brief Find the Atom address given a name, this is needed to properly<br>
<br>
<br>
_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
</blockquote></div></div></div><br></div></div>
<br>_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
<br></blockquote></div><br></div></div>