[lld] r232460 - [ELF] Use parallel_for_each for writing.
Sean Silva
chisophugis at gmail.com
Wed Mar 18 17:35:42 PDT 2015
Actually I'm wondering if we're doing *anything* in parallel, since perf is
reporting "0.999 CPUs utilized".
-- Sean Silva
On Wed, Mar 18, 2015 at 9:38 AM, Rui Ueyama <ruiu at google.com> wrote:
> It's not strange. Making something parallel doesn't always make it run
> faster. Oftentimes it makes thing even slower. That's the whole point why I
> emphasized the importance of accurate benchmark. (Note that this is a
> result of linking Clang. You might see different results depending on
> programs.)
>
> Rafael, it's the ELF writer. Unless you cross link ELF executables on
> Windows, this piece of code is not executed on Windows.
>
> On Wed, Mar 18, 2015 at 9:32 AM, Rafael EspĂndola <
> rafael.espindola at gmail.com> wrote:
>
>> As with anything threading related, it might also be worth
>> benchmarking it on Windows.
>>
>> On 18 March 2015 at 12:31, Shankar Easwaran <shankare at codeaurora.org>
>> wrote:
>> > It looks like these are the right numbers and Strange, I dont see a huge
>> > advantage of the patch trying to parallelize writing output sections in
>> > parallel.
>> >
>> >
>> > On 3/18/2015 11:23 AM, Rafael EspĂndola wrote:
>> >>
>> >> On 18 March 2015 at 12:14, Shankar Easwaran <shankare at codeaurora.org>
>> >> wrote:
>> >>>
>> >>> Does this repeat with the same numbers across similar tries ?
>> >>
>> >> The "-r 20" tells perf to do 20 runs. Repeating the entire thing for
>> >> sanity check I got
>> >>
>> >>
>> >> master:
>> >> 1850.315854 task-clock (msec) # 0.999 CPUs
>> >> utilized ( +- 0.20% )
>> >> 1,246 context-switches # 0.673 K/sec
>> >> 0 cpu-migrations # 0.000 K/sec
>> >> ( +-100.00% )
>> >> 191,223 page-faults # 0.103 M/sec
>> >> ( +- 0.00% )
>> >> 5,570,279,746 cycles # 3.010 GHz
>> >> ( +- 0.08% )
>> >> 3,076,652,220 stalled-cycles-frontend # 55.23% frontend
>> >> cycles idle ( +- 0.15% )
>> >> <not supported> stalled-cycles-backend
>> >> 6,061,467,442 instructions # 1.09 insns per
>> >> cycle
>> >> # 0.51 stalled
>> >> cycles per insn ( +- 0.00% )
>> >> 1,262,014,047 branches # 682.053 M/sec
>> >> ( +- 0.00% )
>> >> 26,526,169 branch-misses # 2.10% of all
>> >> branches ( +- 0.00% )
>> >>
>> >> 1.852094924 seconds time elapsed
>> >> ( +- 0.20% )
>> >>
>> >> master minus your patch:
>> >>
>> >> 1837.986418 task-clock (msec) # 0.999 CPUs
>> >> utilized ( +- 0.01% )
>> >> 1,170 context-switches # 0.637 K/sec
>> >> 0 cpu-migrations # 0.000 K/sec
>> >> 191,225 page-faults # 0.104 M/sec
>> >> ( +- 0.00% )
>> >> 5,517,484,340 cycles # 3.002 GHz
>> >> ( +- 0.01% )
>> >> 3,036,583,530 stalled-cycles-frontend # 55.04% frontend
>> >> cycles idle ( +- 0.02% )
>> >> <not supported> stalled-cycles-backend
>> >> 6,004,436,870 instructions # 1.09 insns per
>> >> cycle
>> >> # 0.51 stalled
>> >> cycles per insn ( +- 0.00% )
>> >> 1,250,685,716 branches # 680.465 M/sec
>> >> ( +- 0.00% )
>> >> 26,539,486 branch-misses # 2.12% of all
>> >> branches ( +- 0.00% )
>> >>
>> >> 1.839759787 seconds time elapsed
>> >> ( +- 0.01% )
>> >>
>> >>
>> >> Cheers,
>> >> Rafael
>> >>
>> >
>> >
>> > --
>> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>> hosted by
>> > the Linux Foundation
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150318/f740669e/attachment.html>
More information about the llvm-commits
mailing list