<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Mar 13, 2015 at 12:35 PM, Shankar Easwaran <span dir="ltr"><<a href="mailto:shankare@codeaurora.org" target="_blank">shankare@codeaurora.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 3/13/2015 1:59 PM, Rafael Espíndola wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


I will do a run with --merge-strings. This should probably the the<br>


default to match other ELF linkers.<br>


</blockquote>


Trying --merge-strings with today's trunk I got<br>


<br>


* comment got 77 797 bytes smaller.<br>


* rodata got 9 394 257 bytes smaller.<br>


</blockquote></span>


We can significantly improve merge string performance by delaying merging strings until the sections/atoms are garbage collected. We do it very early in the reader.<br></blockquote><div><br></div><div>From my experience, I can say it's very hard to predict doing something can significantly improve performance just by taking a look at code and thinking.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Are you using oprofile to get this stats ?<div class="HOEnZb"><div class="h5"><br>


<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<br>


Comparing with gold, comment now has the same size and rodata is 55<br>


021 bytes bigger.<br>


<br>


Amusingly, merging strings seems to make lld a bit faster. With<br>


today's files I got:<br>


<br>


lld:<br>


------------------------------<u></u>------------------------------<u></u>---------------<br>


<br>


        1985.256427      task-clock (msec)         #    0.999 CPUs<br>


utilized            ( +-  0.07% )<br>


              1,152      context-switches          #    0.580 K/sec<br>


                  0      cpu-migrations            #    0.000 K/sec<br>


                ( +-100.00% )<br>


            199,309      page-faults               #    0.100 M/sec<br>


      5,970,383,833      cycles                    #    3.007 GHz<br>


                ( +-  0.07% )<br>


      3,413,740,580      stalled-cycles-frontend   #   57.18% frontend<br>


cycles idle     ( +-  0.12% )<br>


    <not supported>      stalled-cycles-backend<br>


      6,240,156,987      instructions              #    1.05  insns per<br>


cycle<br>


                                                   #    0.55  stalled<br>


cycles per insn  ( +-  0.01% )<br>


      1,293,186,347      branches                  #  651.395 M/sec<br>


                ( +-  0.01% )<br>


         26,687,288      branch-misses             #    2.06% of all<br>


branches          ( +-  0.00% )<br>


<br>


        1.987125976 seconds time elapsed<br>


           ( +-  0.07% )<br>


------------------------------<u></u>------------------------------<u></u>-----------------------<br>


ldd --merge-strings:<br>


<br>


------------------------------<u></u>------------------------------<u></u>------------------<br>


        1912.735291      task-clock (msec)         #    0.999 CPUs<br>


utilized            ( +-  0.10% )<br>


              1,152      context-switches          #    0.602 K/sec<br>


                  0      cpu-migrations            #    0.000 K/sec<br>


                ( +-100.00% )<br>


            187,916      page-faults               #    0.098 M/sec<br>


                ( +-  0.00% )<br>


      5,749,920,058      cycles                    #    3.006 GHz<br>


                ( +-  0.04% )<br>


      3,250,485,516      stalled-cycles-frontend   #   56.53% frontend<br>


cycles idle     ( +-  0.07% )<br>


    <not supported>      stalled-cycles-backend<br>


      5,987,870,976      instructions              #    1.04  insns per<br>


cycle<br>


                                                   #    0.54  stalled<br>


cycles per insn  ( +-  0.00% )<br>


      1,250,773,036      branches                  #  653.919 M/sec<br>


                ( +-  0.00% )<br>


         27,922,489      branch-misses             #    2.23% of all<br>


branches          ( +-  0.00% )<br>


<br>


        1.914565005 seconds time elapsed<br>


           ( +-  0.10% )<br>


------------------------------<u></u>------------------------------<u></u>----------------<br>


<br>


<br>


gold<br>


<br>


------------------------------<u></u>------------------------------<u></u>-------------------<br>


        1000.132594      task-clock (msec)         #    0.999 CPUs<br>


utilized            ( +-  0.01% )<br>


                  0      context-switches          #    0.000 K/sec<br>


                  0      cpu-migrations            #    0.000 K/sec<br>


             77,836      page-faults               #    0.078 M/sec<br>


      3,002,431,314      cycles                    #    3.002 GHz<br>


                ( +-  0.01% )<br>


      1,404,393,569      stalled-cycles-frontend   #   46.78% frontend<br>


cycles idle     ( +-  0.02% )<br>


    <not supported>      stalled-cycles-backend<br>


      4,110,576,101      instructions              #    1.37  insns per<br>


cycle<br>


                                                   #    0.34  stalled<br>


cycles per insn  ( +-  0.00% )<br>


        869,160,761      branches                  #  869.046 M/sec<br>


                ( +-  0.00% )<br>


         15,691,670      branch-misses             #    1.81% of all<br>


branches          ( +-  0.00% )<br>


<br>


        1.001044905 seconds time elapsed<br>


           ( +-  0.01% )<br>


------------------------------<u></u>------------------------------<u></u>-------------------<br>


<br>


I have attached the run.sh script I used to collect the numbers.<br>


<br>


Cheers,<br>


Rafael<br>


</blockquote>


<br>


<br></div></div><div class="HOEnZb"><div class="h5">


-- <br>


Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation<br>


<br>


</div></div></blockquote></div><br></div></div>