<div dir="ltr">Why don't you just run it many more times?</div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 17, 2015 at 3:20 PM, Shankar Easwaran <span dir="ltr"><<a href="mailto:shankare@codeaurora.org" target="_blank">shankare@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Not sure if doing this same experiment on different unixes may give some information (or) linking the same object files on windows will give more information ?<br>

<br>

How may data points do you usually collect ?<br>

<br>

Shankar Easwaran<div><div class="h5"><br>

<br>

On 3/17/2015 5:10 PM, Rui Ueyama wrote:<br>

</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">

I reformat your results here. As you can see S/N is too low. Maybe we<br>

cannot say anything only from four data points.<br>

<br>

LLD with patch<br>

4.16user 0.80system 0:03.06elapsed 162%CPU (0avgtext+0avgdata<br>

7174160maxresident)k<br>

3.94user 0.86system 0:02.93elapsed 163%CPU (0avgtext+0avgdata<br>

7175808maxresident)k<br>

4.36user 1.05system 0:03.08elapsed 175%CPU (0avgtext+0avgdata<br>

7176320maxresident)k<br>

4.17user 0.72system 0:02.93elapsed 166%CPU (0avgtext+0avgdata<br>

7175120maxresident)k<br>

<br>

LLD without patch<br>

4.49user 0.92system 0:03.32elapsed 162%CPU (0avgtext+0avgdata<br>

7179984maxresident)k<br>

4.12user 0.83system 0:03.22elapsed 154%CPU (0avgtext+0avgdata<br>

7172704maxresident)k<br>

4.38user 0.90system 0:03.14elapsed 168%CPU (0avgtext+0avgdata<br>

7175600maxresident)k<br>

4.20user 0.79system 0:03.08elapsed 161%CPU (0avgtext+0avgdata<br>

7174864maxresident)k<br>

<br>

<br>

On Tue, Mar 17, 2015 at 2:57 PM, Shankar Easwaran <<a href="mailto:shankare@codeaurora.org" target="_blank">shankare@codeaurora.org</a>><br>

wrote:<br>

<br>

</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5">

I tried to measure this again with 4 tries and got results, to make sure<br>

just in case, and I see few results identical to what I measured before :-<br>

<br></div></div>

*Raw data below :-*<div><div class="h5"><br>

<br>

LLD Try With Patch #1<br>

4.16user 0.80system 0:03.06elapsed 162%CPU (0avgtext+0avgdata<br>

7174160maxresident)k<br>

LLD Try Without Patch #1<br>

4.49user 0.92system 0:03.32elapsed 162%CPU (0avgtext+0avgdata<br>

7179984maxresident)k<br>

BFD Try #1<br>

7.81user 0.68system 0:08.53elapsed 99%CPU (0avgtext+0avgdata<br>

3230416maxresident)k<br>

LLD Try With Patch #2<br>

3.94user 0.86system 0:02.93elapsed 163%CPU (0avgtext+0avgdata<br>

7175808maxresident)k<br>

LLD Try Without Patch #2<br>

4.12user 0.83system 0:03.22elapsed 154%CPU (0avgtext+0avgdata<br>

7172704maxresident)k<br>

BFD Try #2<br>

7.78user 0.75system 0:08.57elapsed 99%CPU (0avgtext+0avgdata<br>

3230416maxresident)k<br>

LLD Try With Patch #3<br>

4.36user 1.05system 0:03.08elapsed 175%CPU (0avgtext+0avgdata<br>

7176320maxresident)k<br>

LLD Try Without Patch #3<br>

4.38user 0.90system 0:03.14elapsed 168%CPU (0avgtext+0avgdata<br>

7175600maxresident)k<br>

BFD Try #3<br>

7.78user 0.64system 0:08.46elapsed 99%CPU (0avgtext+0avgdata<br>

3230416maxresident)k<br>

LLD Try With Patch #4<br>

4.17user 0.72system 0:02.93elapsed 166%CPU (0avgtext+0avgdata<br>

7175120maxresident)k<br>

LLD Try Without Patch #4<br>

4.20user 0.79system 0:03.08elapsed 161%CPU (0avgtext+0avgdata<br>

7174864maxresident)k<br>

BFD Try #4<br>

7.77user 0.66system 0:08.46elapsed 99%CPU (0avgtext+0avgdata<br>

3230416maxresident)k<br>

<br></div></div>

*Questions :-*<span class=""><br>

<br>

As Rui mentions I dont know why the user time is more without the patch,<br>

any methods to verify this ?<br>

Could this be because of user threads instead of kernel threads ?<br>

<br>

Shankar Easwaran<br>

<br>

<br>

On 3/17/2015 3:35 PM, Shankar Easwaran wrote:<br>

<br>

Yes, this is true. There were several logs of runs in the same file that I<br>

read into the commit and manually removing them resulted in two user lines.<br>

<br>

But the result for all reasons is true. I can re-measure the time taken<br>

though.<br>

<br>

Shankar Easwaran<br>

<br>

On 3/17/2015 2:30 PM, Rui Ueyama wrote:<br>

<br>

On Mon, Mar 16, 2015 at 8:29 PM, Shankar Easwaran<br></span>

<<a href="mailto:shankare@codeaurora.org" target="_blank">shankare@codeaurora.org</a>> <<a href="mailto:shankare@codeaurora.org" target="_blank">shankare@codeaurora.org</a>><div><div class="h5"><br>

wrote:<br>

<br>

Author: shankare<br>

Date: Mon Mar 16 22:29:32 2015<br>

New Revision: 232460<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=232460&view=rev" target="_blank">http://llvm.org/viewvc/llvm-<u></u>project?rev=232460&view=rev</a><br>

Log:<br>

[ELF] Use parallel_for_each for writing.<br>

<br>

This changes improves performance of lld, when self-hosting lld, when<br>

compared<br>

with the bfd linker. BFD linker on average takes 8 seconds in elapsed<br>

time.<br>

lld takes 3 seconds elapased time average. Without this change, lld takes<br>

~5<br>

seconds average. The runtime comparisons were done on a release build and<br>

measured by running linking thrice.<br>

<br>

lld self-host without the change<br>

------------------------------<u></u>----<br>

real    0m3.196s<br>

user    0m4.580s<br>

sys     0m0.832s<br>

<br>

lld self-host with lld<br>

-----------------------<br>

user    0m3.024s<br>

user    0m3.252s<br>

sys     0m0.796s<br>

<br>

  The above results don't look real output of "time" command.<br>

<br>

If it's real, it's too good to be true, assuming the first line of the<br>

second result is "real" instead of "user".<br>

<br>

"real" is wall clock time from process start to process exit. "user" is<br>

CPU<br>

time consumed by the process in user mode (if a process is multi-threaded,<br>

it can be larger than real).<br>

<br>

Your result shows significant improvement in user time. Which means you<br>

have significantly reduced the amount of processing time to do the same<br>

thing compared to before. However, because this change didn't change<br>

algorithm, but just execute them in parallel, it couldn't happen.<br>

<br>

Something's not correct.<br>

<br>

I appreciate your effort to make LLD faster, but we need to be careful<br>

about benchmark results. If we don't measure improvements accurately, it's<br>

easy to make an "optimization" that makes things slower.<br>

<br>

Another important thing is to disbelieve what you do when you optimize<br>

something and measure its effect. It sometimes happen that I believe<br>

something is going to improve performance 100% sure but it actually<br>

wouldn't.<br>

<br>

time taken to build lld with bfd<br>

<br>

------------------------------<u></u>--<br>

real    0m8.419s<br>

user    0m7.748s<br>

sys     0m0.632s<br>

<br>

Modified:<br>

      lld/trunk/lib/ReaderWriter/<u></u>ELF/OutputELFWriter.h<br>

      lld/trunk/lib/ReaderWriter/<u></u>ELF/SectionChunks.h<br>

<br>

Modified: lld/trunk/lib/ReaderWriter/<u></u>ELF/OutputELFWriter.h<br>

URL:<br>

<br>

<a href="http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h?rev=232460&r1=232459&r2=232460&view=diff" target="_blank">http://llvm.org/viewvc/llvm-<u></u>project/lld/trunk/lib/<u></u>ReaderWriter/ELF/<u></u>OutputELFWriter.h?rev=232460&<u></u>r1=232459&r2=232460&view=diff</a><br>

<br>

==============================<u></u>==============================<u></u>==================<br>

<br>

--- lld/trunk/lib/ReaderWriter/<u></u>ELF/OutputELFWriter.h (original)<br>

+++ lld/trunk/lib/ReaderWriter/<u></u>ELF/OutputELFWriter.h Mon Mar 16 22:29:32<br>

2015<br>

@@ -586,8 +586,10 @@ std::error_code OutputELFWriter<ELFT>::w<br>

     _elfHeader->write(this, _layout, *buffer);<br>

     _programHeader->write(this, _layout, *buffer);<br>

<br>

-  for (auto section : _layout.sections())<br>

-    section->write(this, _layout, *buffer);<br>

+  auto sections = _layout.sections();<br>

+  parallel_for_each(<br>

+      sections.begin(), sections.end(),<br>

+      [&](Chunk<ELFT> *section) { section->write(this, _layout, *buffer);<br>

});<br>

     writeTask.end();<br>

<br>

     ScopedTask commitTask(getDefaultDomain(), "ELF Writer commit to<br>

disk");<br>

<br>

Modified: lld/trunk/lib/ReaderWriter/<u></u>ELF/SectionChunks.h<br>

URL:<br>

<br>

<a href="http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h?rev=232460&r1=232459&r2=232460&view=diff" target="_blank">http://llvm.org/viewvc/llvm-<u></u>project/lld/trunk/lib/<u></u>ReaderWriter/ELF/<u></u>SectionChunks.h?rev=232460&r1=<u></u>232459&r2=232460&view=diff</a><br>

<br>

==============================<u></u>==============================<u></u>==================<br>

<br>

--- lld/trunk/lib/ReaderWriter/<u></u>ELF/SectionChunks.h (original)<br>

+++ lld/trunk/lib/ReaderWriter/<u></u>ELF/SectionChunks.h Mon Mar 16 22:29:32<br>

2015<br>

@@ -234,17 +234,17 @@ public:<br>

     /// routine gets called after the linker fixes up the virtual address<br>

     /// of the section<br>

     virtual void assignVirtualAddress(uint64_t addr) override {<br>

-    for (auto &ai : _atoms) {<br>

+    parallel_for_each(_atoms.<u></u>begin(), _atoms.end(), [&](AtomLayout *ai) {<br>

         ai->_virtualAddr = addr + ai->_fileOffset;<br>

-    }<br>

+    });<br>

     }<br>

<br>

     /// \brief Set the file offset of each Atom in the section. This<br>

routine<br>

     /// gets called after the linker fixes up the section offset<br>

     void assignFileOffsets(uint64_t offset) override {<br>

-    for (auto &ai : _atoms) {<br>

+    parallel_for_each(_atoms.<u></u>begin(), _atoms.end(), [&](AtomLayout *ai) {<br>

         ai->_fileOffset = offset + ai->_fileOffset;<br>

-    }<br>

+    });<br>

     }<br>

<br>

     /// \brief Find the Atom address given a name, this is needed to<br>

properly<br>

<br>

<br>

______________________________<u></u>_________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailman/listinfo/llvm-commits</a><br>

<br>

<br>

<br>

<br>

<br>

--<br>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation<br>

<br>

<br>

</div></div></blockquote></blockquote><div class="HOEnZb"><div class="h5">

<br>

<br>

-- <br>

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation<br>

<br>

</div></div></blockquote></div><br></div>