<div dir="ltr">It's not strange. Making something parallel doesn't always make it run faster. Oftentimes it makes thing even slower. That's the whole point why I emphasized the importance of accurate benchmark. (Note that this is a result of linking Clang. You might see different results depending on programs.)<div><br></div><div>Rafael, it's the ELF writer. Unless you cross link ELF executables on Windows, this piece of code is not executed on Windows.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 18, 2015 at 9:32 AM, Rafael Espíndola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">As with anything threading related, it might also be worth<br>


benchmarking it on Windows.<br>


<div class="HOEnZb"><div class="h5"><br>


On 18 March 2015 at 12:31, Shankar Easwaran <<a href="mailto:shankare@codeaurora.org">shankare@codeaurora.org</a>> wrote:<br>


> It looks like these are the right numbers and Strange, I dont see a huge<br>


> advantage of the patch trying to parallelize writing output sections in<br>


> parallel.<br>


><br>


><br>


> On 3/18/2015 11:23 AM, Rafael Espíndola wrote:<br>


>><br>


>> On 18 March 2015 at 12:14, Shankar Easwaran <<a href="mailto:shankare@codeaurora.org">shankare@codeaurora.org</a>><br>


>> wrote:<br>


>>><br>


>>> Does this repeat with the same numbers across similar tries ?<br>


>><br>


>> The "-r 20" tells perf to do 20 runs. Repeating the entire thing for<br>


>> sanity check I got<br>


>><br>


>><br>


>> master:<br>


>>         1850.315854      task-clock (msec)         #    0.999 CPUs<br>


>> utilized            ( +-  0.20% )<br>


>>               1,246      context-switches          #    0.673 K/sec<br>


>>                   0      cpu-migrations            #    0.000 K/sec<br>


>>                 ( +-100.00% )<br>


>>             191,223      page-faults               #    0.103 M/sec<br>


>>                 ( +-  0.00% )<br>


>>       5,570,279,746      cycles                    #    3.010 GHz<br>


>>                 ( +-  0.08% )<br>


>>       3,076,652,220      stalled-cycles-frontend   #   55.23% frontend<br>


>> cycles idle     ( +-  0.15% )<br>


>>     <not supported>      stalled-cycles-backend<br>


>>       6,061,467,442      instructions              #    1.09  insns per<br>


>> cycle<br>


>>                                                    #    0.51  stalled<br>


>> cycles per insn  ( +-  0.00% )<br>


>>       1,262,014,047      branches                  #  682.053 M/sec<br>


>>                 ( +-  0.00% )<br>


>>          26,526,169      branch-misses             #    2.10% of all<br>


>> branches          ( +-  0.00% )<br>


>><br>


>>         1.852094924 seconds time elapsed<br>


>>            ( +-  0.20% )<br>


>><br>


>> master minus your patch:<br>


>><br>


>>         1837.986418      task-clock (msec)         #    0.999 CPUs<br>


>> utilized            ( +-  0.01% )<br>


>>               1,170      context-switches          #    0.637 K/sec<br>


>>                   0      cpu-migrations            #    0.000 K/sec<br>


>>             191,225      page-faults               #    0.104 M/sec<br>


>>                 ( +-  0.00% )<br>


>>       5,517,484,340      cycles                    #    3.002 GHz<br>


>>                 ( +-  0.01% )<br>


>>       3,036,583,530      stalled-cycles-frontend   #   55.04% frontend<br>


>> cycles idle     ( +-  0.02% )<br>


>>     <not supported>      stalled-cycles-backend<br>


>>       6,004,436,870      instructions              #    1.09  insns per<br>


>> cycle<br>


>>                                                    #    0.51  stalled<br>


>> cycles per insn  ( +-  0.00% )<br>


>>       1,250,685,716      branches                  #  680.465 M/sec<br>


>>                 ( +-  0.00% )<br>


>>          26,539,486      branch-misses             #    2.12% of all<br>


>> branches          ( +-  0.00% )<br>


>><br>


>>         1.839759787 seconds time elapsed<br>


>>            ( +-  0.01% )<br>


>><br>


>><br>


>> Cheers,<br>


>> Rafael<br>


>><br>


><br>


><br>


> --<br>


> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by<br>


> the Linux Foundation<br>


><br>


</div></div></blockquote></div><br></div>