[lld] r232460 - [ELF] Use parallel_for_each for writing.
Rui Ueyama
ruiu at google.com
Wed Mar 18 09:38:48 PDT 2015
It's not strange. Making something parallel doesn't always make it run
faster. Oftentimes it makes thing even slower. That's the whole point why I
emphasized the importance of accurate benchmark. (Note that this is a
result of linking Clang. You might see different results depending on
programs.)
Rafael, it's the ELF writer. Unless you cross link ELF executables on
Windows, this piece of code is not executed on Windows.
On Wed, Mar 18, 2015 at 9:32 AM, Rafael EspĂndola <
rafael.espindola at gmail.com> wrote:
> As with anything threading related, it might also be worth
> benchmarking it on Windows.
>
> On 18 March 2015 at 12:31, Shankar Easwaran <shankare at codeaurora.org>
> wrote:
> > It looks like these are the right numbers and Strange, I dont see a huge
> > advantage of the patch trying to parallelize writing output sections in
> > parallel.
> >
> >
> > On 3/18/2015 11:23 AM, Rafael EspĂndola wrote:
> >>
> >> On 18 March 2015 at 12:14, Shankar Easwaran <shankare at codeaurora.org>
> >> wrote:
> >>>
> >>> Does this repeat with the same numbers across similar tries ?
> >>
> >> The "-r 20" tells perf to do 20 runs. Repeating the entire thing for
> >> sanity check I got
> >>
> >>
> >> master:
> >> 1850.315854 task-clock (msec) # 0.999 CPUs
> >> utilized ( +- 0.20% )
> >> 1,246 context-switches # 0.673 K/sec
> >> 0 cpu-migrations # 0.000 K/sec
> >> ( +-100.00% )
> >> 191,223 page-faults # 0.103 M/sec
> >> ( +- 0.00% )
> >> 5,570,279,746 cycles # 3.010 GHz
> >> ( +- 0.08% )
> >> 3,076,652,220 stalled-cycles-frontend # 55.23% frontend
> >> cycles idle ( +- 0.15% )
> >> <not supported> stalled-cycles-backend
> >> 6,061,467,442 instructions # 1.09 insns per
> >> cycle
> >> # 0.51 stalled
> >> cycles per insn ( +- 0.00% )
> >> 1,262,014,047 branches # 682.053 M/sec
> >> ( +- 0.00% )
> >> 26,526,169 branch-misses # 2.10% of all
> >> branches ( +- 0.00% )
> >>
> >> 1.852094924 seconds time elapsed
> >> ( +- 0.20% )
> >>
> >> master minus your patch:
> >>
> >> 1837.986418 task-clock (msec) # 0.999 CPUs
> >> utilized ( +- 0.01% )
> >> 1,170 context-switches # 0.637 K/sec
> >> 0 cpu-migrations # 0.000 K/sec
> >> 191,225 page-faults # 0.104 M/sec
> >> ( +- 0.00% )
> >> 5,517,484,340 cycles # 3.002 GHz
> >> ( +- 0.01% )
> >> 3,036,583,530 stalled-cycles-frontend # 55.04% frontend
> >> cycles idle ( +- 0.02% )
> >> <not supported> stalled-cycles-backend
> >> 6,004,436,870 instructions # 1.09 insns per
> >> cycle
> >> # 0.51 stalled
> >> cycles per insn ( +- 0.00% )
> >> 1,250,685,716 branches # 680.465 M/sec
> >> ( +- 0.00% )
> >> 26,539,486 branch-misses # 2.12% of all
> >> branches ( +- 0.00% )
> >>
> >> 1.839759787 seconds time elapsed
> >> ( +- 0.01% )
> >>
> >>
> >> Cheers,
> >> Rafael
> >>
> >
> >
> > --
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by
> > the Linux Foundation
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150318/60588b5f/attachment.html>
More information about the llvm-commits
mailing list