[lld] r232460 - [ELF] Use parallel_for_each for writing.

Wed Mar 18 09:38:48 PDT 2015

It's not strange. Making something parallel doesn't always make it run
faster. Oftentimes it makes thing even slower. That's the whole point why I
emphasized the importance of accurate benchmark. (Note that this is a
result of linking Clang. You might see different results depending on
programs.)

Rafael, it's the ELF writer. Unless you cross link ELF executables on
Windows, this piece of code is not executed on Windows.

On Wed, Mar 18, 2015 at 9:32 AM, Rafael Espíndola <
rafael.espindola at gmail.com> wrote:

> As with anything threading related, it might also be worth
> benchmarking it on Windows.
>
> On 18 March 2015 at 12:31, Shankar Easwaran <shankare at codeaurora.org>
> wrote:
> > It looks like these are the right numbers and Strange, I dont see a huge
> > advantage of the patch trying to parallelize writing output sections in
> > parallel.
> >
> >
> > On 3/18/2015 11:23 AM, Rafael Espíndola wrote:
> >>
> >> On 18 March 2015 at 12:14, Shankar Easwaran <shankare at codeaurora.org>
> >> wrote:
> >>>
> >>> Does this repeat with the same numbers across similar tries ?
> >>
> >> The "-r 20" tells perf to do 20 runs. Repeating the entire thing for
> >> sanity check I got
> >>
> >>
> >> master:
> >>         1850.315854      task-clock (msec)         #    0.999 CPUs
> >> utilized            ( +-  0.20% )
> >>               1,246      context-switches          #    0.673 K/sec
> >>                   0      cpu-migrations            #    0.000 K/sec
> >>                 ( +-100.00% )
> >>             191,223      page-faults               #    0.103 M/sec
> >>                 ( +-  0.00% )
> >>       5,570,279,746      cycles                    #    3.010 GHz
> >>                 ( +-  0.08% )
> >>       3,076,652,220      stalled-cycles-frontend   #   55.23% frontend
> >> cycles idle     ( +-  0.15% )
> >>     <not supported>      stalled-cycles-backend
> >>       6,061,467,442      instructions              #    1.09  insns per
> >> cycle
> >>                                                    #    0.51  stalled
> >> cycles per insn  ( +-  0.00% )
> >>       1,262,014,047      branches                  #  682.053 M/sec
> >>                 ( +-  0.00% )
> >>          26,526,169      branch-misses             #    2.10% of all
> >> branches          ( +-  0.00% )
> >>
> >>         1.852094924 seconds time elapsed
> >>            ( +-  0.20% )
> >>
> >> master minus your patch:
> >>
> >>         1837.986418      task-clock (msec)         #    0.999 CPUs
> >> utilized            ( +-  0.01% )
> >>               1,170      context-switches          #    0.637 K/sec
> >>                   0      cpu-migrations            #    0.000 K/sec
> >>             191,225      page-faults               #    0.104 M/sec
> >>                 ( +-  0.00% )
> >>       5,517,484,340      cycles                    #    3.002 GHz
> >>                 ( +-  0.01% )
> >>       3,036,583,530      stalled-cycles-frontend   #   55.04% frontend
> >> cycles idle     ( +-  0.02% )
> >>     <not supported>      stalled-cycles-backend
> >>       6,004,436,870      instructions              #    1.09  insns per
> >> cycle
> >>                                                    #    0.51  stalled
> >> cycles per insn  ( +-  0.00% )
> >>       1,250,685,716      branches                  #  680.465 M/sec
> >>                 ( +-  0.00% )
> >>          26,539,486      branch-misses             #    2.12% of all
> >> branches          ( +-  0.00% )
> >>
> >>         1.839759787 seconds time elapsed
> >>            ( +-  0.01% )
> >>
> >>
> >> Cheers,
> >> Rafael
> >>
> >
> >
> > --
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by
> > the Linux Foundation
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150318/60588b5f/attachment.html>