<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-cite-prefix"><br>

      I tried to measure this again with 4 tries and got results, to

      make sure just in case, and I see few results identical to what I

      measured before :-<br>

      <br>

      <u><b>Raw data below :-</b></u><br>

      <br>

      LLD Try With Patch #1<br>

      4.16user 0.80system 0:03.06elapsed 162%CPU (0avgtext+0avgdata

      7174160maxresident)k<br>

      LLD Try Without Patch #1<br>

      4.49user 0.92system 0:03.32elapsed 162%CPU (0avgtext+0avgdata

      7179984maxresident)k<br>

      BFD Try #1<br>

      7.81user 0.68system 0:08.53elapsed 99%CPU (0avgtext+0avgdata

      3230416maxresident)k<br>

      LLD Try With Patch #2<br>

      3.94user 0.86system 0:02.93elapsed 163%CPU (0avgtext+0avgdata

      7175808maxresident)k<br>

      LLD Try Without Patch #2<br>

      4.12user 0.83system 0:03.22elapsed 154%CPU (0avgtext+0avgdata

      7172704maxresident)k<br>

      BFD Try #2<br>

      7.78user 0.75system 0:08.57elapsed 99%CPU (0avgtext+0avgdata

      3230416maxresident)k<br>

      LLD Try With Patch #3<br>

      4.36user 1.05system 0:03.08elapsed 175%CPU (0avgtext+0avgdata

      7176320maxresident)k<br>

      LLD Try Without Patch #3<br>

      4.38user 0.90system 0:03.14elapsed 168%CPU (0avgtext+0avgdata

      7175600maxresident)k<br>

      BFD Try #3<br>

      7.78user 0.64system 0:08.46elapsed 99%CPU (0avgtext+0avgdata

      3230416maxresident)k<br>

      LLD Try With Patch #4<br>

      4.17user 0.72system 0:02.93elapsed 166%CPU (0avgtext+0avgdata

      7175120maxresident)k<br>

      LLD Try Without Patch #4<br>

      4.20user 0.79system 0:03.08elapsed 161%CPU (0avgtext+0avgdata

      7174864maxresident)k<br>

      BFD Try #4<br>

      7.77user 0.66system 0:08.46elapsed 99%CPU (0avgtext+0avgdata

      3230416maxresident)k<br>

      <br>

      <u><b>Questions :-</b></u><br>

      <br>

      As Rui mentions I dont know why the user time is more without the

      patch, any methods to verify this ?<br>

      Could this be because of user threads instead of kernel threads ?

      <b><br>

      </b><br>

      Shankar Easwaran<br>

      <br>

      On 3/17/2015 3:35 PM, Shankar Easwaran wrote:<br>

    </div>

    <blockquote cite="mid:5508902F.9090900@codeaurora.org" type="cite">Yes,

      this is true. There were several logs of runs in the same file

      that I read into the commit and manually removing them resulted in

      two user lines.

      <br>

      <br>

      But the result for all reasons is true. I can re-measure the time

      taken though.

      <br>

      <br>

      Shankar Easwaran

      <br>

      <br>

      On 3/17/2015 2:30 PM, Rui Ueyama wrote:

      <br>

      <blockquote type="cite">On Mon, Mar 16, 2015 at 8:29 PM, Shankar

        Easwaran <a class="moz-txt-link-rfc2396E" href="mailto:shankare@codeaurora.org"><shankare@codeaurora.org></a>

        <br>

        wrote:

        <br>

        <br>

        <blockquote type="cite">Author: shankare

          <br>

          Date: Mon Mar 16 22:29:32 2015

          <br>

          New Revision: 232460

          <br>

          <br>

          URL:

          <a class="moz-txt-link-freetext" href="http://llvm.org/viewvc/llvm-project?rev=232460&view=rev">http://llvm.org/viewvc/llvm-project?rev=232460&view=rev</a>

          <br>

          Log:

          <br>

          [ELF] Use parallel_for_each for writing.

          <br>

          <br>

          This changes improves performance of lld, when self-hosting

          lld, when

          <br>

          compared

          <br>

          with the bfd linker. BFD linker on average takes 8 seconds in

          elapsed time.

          <br>

          lld takes 3 seconds elapased time average. Without this

          change, lld takes

          <br>

          ~5

          <br>

          seconds average. The runtime comparisons were done on a

          release build and

          <br>

          measured by running linking thrice.

          <br>

          <br>

          lld self-host without the change

          <br>

          ----------------------------------

          <br>

          real    0m3.196s

          <br>

          user    0m4.580s

          <br>

          sys     0m0.832s

          <br>

          <br>

          lld self-host with lld

          <br>

          -----------------------

          <br>

          user    0m3.024s

          <br>

          user    0m3.252s

          <br>

          sys     0m0.796s

          <br>

          <br>

        </blockquote>

        The above results don't look real output of "time" command.

        <br>

        <br>

        If it's real, it's too good to be true, assuming the first line

        of the

        <br>

        second result is "real" instead of "user".

        <br>

        <br>

        "real" is wall clock time from process start to process exit.

        "user" is CPU

        <br>

        time consumed by the process in user mode (if a process is

        multi-threaded,

        <br>

        it can be larger than real).

        <br>

        <br>

        Your result shows significant improvement in user time. Which

        means you

        <br>

        have significantly reduced the amount of processing time to do

        the same

        <br>

        thing compared to before. However, because this change didn't

        change

        <br>

        algorithm, but just execute them in parallel, it couldn't

        happen.

        <br>

        <br>

        Something's not correct.

        <br>

        <br>

        I appreciate your effort to make LLD faster, but we need to be

        careful

        <br>

        about benchmark results. If we don't measure improvements

        accurately, it's

        <br>

        easy to make an "optimization" that makes things slower.

        <br>

        <br>

        Another important thing is to disbelieve what you do when you

        optimize

        <br>

        something and measure its effect. It sometimes happen that I

        believe

        <br>

        something is going to improve performance 100% sure but it

        actually

        <br>

        wouldn't.

        <br>

        <br>

        time taken to build lld with bfd

        <br>

        <blockquote type="cite">--------------------------------

          <br>

          real    0m8.419s

          <br>

          user    0m7.748s

          <br>

          sys     0m0.632s

          <br>

          <br>

          Modified:

          <br>

               lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h

          <br>

               lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h

          <br>

          <br>

          Modified: lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h

          <br>

          URL:

          <br>

<a class="moz-txt-link-freetext" href="http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h?rev=232460&r1=232459&r2=232460&view=diff">http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h?rev=232460&r1=232459&r2=232460&view=diff</a>

          <br>

          <br>

==============================================================================

          <br>

          --- lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h

          (original)

          <br>

          +++ lld/trunk/lib/ReaderWriter/ELF/OutputELFWriter.h Mon Mar

          16 22:29:32

          <br>

          2015

          <br>

          @@ -586,8 +586,10 @@ std::error_code

          OutputELFWriter<ELFT>::w

          <br>

              _elfHeader->write(this, _layout, *buffer);

          <br>

              _programHeader->write(this, _layout, *buffer);

          <br>

          <br>

          -  for (auto section : _layout.sections())

          <br>

          -    section->write(this, _layout, *buffer);

          <br>

          +  auto sections = _layout.sections();

          <br>

          +  parallel_for_each(

          <br>

          +      sections.begin(), sections.end(),

          <br>

          +      [&](Chunk<ELFT> *section) {

          section->write(this, _layout, *buffer);

          <br>

          });

          <br>

              writeTask.end();

          <br>

          <br>

              ScopedTask commitTask(getDefaultDomain(), "ELF Writer

          commit to disk");

          <br>

          <br>

          Modified: lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h

          <br>

          URL:

          <br>

<a class="moz-txt-link-freetext" href="http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h?rev=232460&r1=232459&r2=232460&view=diff">http://llvm.org/viewvc/llvm-project/lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h?rev=232460&r1=232459&r2=232460&view=diff</a>

          <br>

          <br>

==============================================================================

          <br>

          --- lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h (original)

          <br>

          +++ lld/trunk/lib/ReaderWriter/ELF/SectionChunks.h Mon Mar 16

          22:29:32 2015

          <br>

          @@ -234,17 +234,17 @@ public:

          <br>

              /// routine gets called after the linker fixes up the

          virtual address

          <br>

              /// of the section

          <br>

              virtual void assignVirtualAddress(uint64_t addr) override

          {

          <br>

          -    for (auto &ai : _atoms) {

          <br>

          +    parallel_for_each(_atoms.begin(), _atoms.end(),

          [&](AtomLayout *ai) {

          <br>

                  ai->_virtualAddr = addr + ai->_fileOffset;

          <br>

          -    }

          <br>

          +    });

          <br>

              }

          <br>

          <br>

              /// \brief Set the file offset of each Atom in the

          section. This routine

          <br>

              /// gets called after the linker fixes up the section

          offset

          <br>

              void assignFileOffsets(uint64_t offset) override {

          <br>

          -    for (auto &ai : _atoms) {

          <br>

          +    parallel_for_each(_atoms.begin(), _atoms.end(),

          [&](AtomLayout *ai) {

          <br>

                  ai->_fileOffset = offset + ai->_fileOffset;

          <br>

          -    }

          <br>

          +    });

          <br>

              }

          <br>

          <br>

              /// \brief Find the Atom address given a name, this is

          needed to

          <br>

          properly

          <br>

          <br>

          <br>

          _______________________________________________

          <br>

          llvm-commits mailing list

          <br>

          <a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a>

          <br>

          <a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a>

          <br>

          <br>

        </blockquote>

      </blockquote>

      <br>

      <br>

    </blockquote>

    <br>

    <br>

    <pre class="moz-signature" cols="72">-- 

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation</pre>

  </body>

</html>