<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi Junmo,<br>
<br>
I tried out your patch on top of r254864, on a juno board, running
on Cortex-A57.<br>
I see the following results:<br>
<br>
<meta charset="utf-8">
<table style="max-width: 100%; border-collapse: collapse;
border-spacing: 0px; color: rgb(51, 51, 51); font-family:
'Helvetica Neue', Helvetica, Arial, sans-serif; font-style:
normal; font-variant: normal; font-weight: normal; letter-spacing:
normal; line-height: 20px; orphans: auto; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px;
font-size: 9pt; border: 1px solid black; background-color:
rgb(255, 255, 255);">
<tbody>
<tr>
<th style="color: rgb(102, 102, 102); cursor: default;
text-align: center; font-weight: bold; font-family: Verdana;
padding: 5px 5px 5px 8px; background-color: rgb(238, 238,
238);" width="500">Performance Regressions - Execution Time</th>
<th style="color: rgb(102, 102, 102); cursor: default;
text-align: center; font-weight: bold; font-family: Verdana;
padding: 5px 5px 5px 8px; background-color: rgb(238, 238,
238);">Δ</th>
</tr>
</tbody><tbody class="searchable">
<tr>
<td class="benchmark-name" style="padding: 5px 5px 5px 8px;"><a
href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.170=3"
style="color: rgb(0, 136, 204); text-decoration: none;">lnt.MultiSource/Benchmarks/Ptrdist/yacr2/yacr2</a></td>
<td style="padding: 5px 5px 5px 8px; background-color:
rgb(255, 132, 132);">9.17%</td>
</tr>
<tr>
<td class="benchmark-name" style="padding: 5px 5px 5px 8px;"><a
href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.264=3"
style="color: rgb(0, 136, 204); text-decoration: none;">lnt.SingleSource/Benchmarks/Shootout-C++/ackermann</a></td>
<td style="padding: 5px 5px 5px 8px; background-color:
rgb(255, 139, 139);">8.02%</td>
</tr>
<tr>
<td class="benchmark-name" style="padding: 5px 5px 5px 8px;"><a
href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.149=3"
style="color: rgb(0, 136, 204); text-decoration: none;">lnt.MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1</a></td>
<td style="padding: 5px 5px 5px 8px; background-color:
rgb(255, 163, 163);">4.78%</td>
</tr>
<tr>
<td class="benchmark-name" style="padding: 5px 5px 5px 8px;"><a
href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.176=3"
style="color: rgb(0, 136, 204); text-decoration: none;">spec.cpu2006.ref.445_gobmk</a></td>
<td style="padding: 5px 5px 5px 8px; background-color:
rgb(255, 195, 195);">1.84%</td>
</tr>
<tr>
<td class="benchmark-name" style="padding: 5px 5px 5px 8px;"><a
href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.94=3"
style="color: rgb(0, 136, 204); text-decoration: none;">spec.cpu2006.ref.483_xalancbmk</a></td>
<td style="padding: 5px 5px 5px 8px; background-color:
rgb(255, 197, 197);">1.75%</td>
</tr>
<tr>
<td class="benchmark-name" style="padding: 5px 5px 5px 8px;"><a
href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.294=3"
style="color: rgb(0, 136, 204); text-decoration: none;">spec.cpu2006.ref.471_omnetpp</a></td>
<td style="padding: 5px 5px 5px 8px; background-color:
rgb(255, 202, 202);">1.43%</td>
</tr>
<tr>
<td class="benchmark-name" style="padding: 5px 5px 5px 8px;"><a
href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.337=3"
style="color: rgb(0, 136, 204); text-decoration: none;">spec.cpu2000.ref.253_perlbmk</a></td>
<td style="padding: 5px 5px 5px 8px; background-color:
rgb(255, 206, 206);">1.22%</td>
</tr>
<tr>
<td class="benchmark-name" style="padding: 5px 5px 5px 8px;"><a
href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.135=3"
style="color: rgb(0, 136, 204); text-decoration: none;">lnt.SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm</a></td>
<td style="padding: 5px 5px 5px 8px; background-color:
rgb(255, 208, 208);">1.10%</td>
</tr>
</tbody>
</table>
<br>
<table style="max-width: 100%; border-collapse: collapse;
border-spacing: 0px; color: rgb(51, 51, 51); font-family:
'Helvetica Neue', Helvetica, Arial, sans-serif; font-style:
normal; font-variant: normal; font-weight: normal; letter-spacing:
normal; line-height: 20px; orphans: auto; text-align: start;
text-indent: 0px; text-transform: none; white-space: normal;
widows: 1; word-spacing: 0px; -webkit-text-stroke-width: 0px;
font-size: 9pt; border: 1px solid black; background-color:
rgb(255, 255, 255);">
<tbody>
<tr>
<th style="color: rgb(102, 102, 102); cursor: default;
text-align: center; font-weight: bold; font-family: Verdana;
padding: 5px 5px 5px 8px; background-color: rgb(238, 238,
238);" width="500">Performance Improvements - Execution Time</th>
<th style="color: rgb(102, 102, 102); cursor: default;
text-align: center; font-weight: bold; font-family: Verdana;
padding: 5px 5px 5px 8px; background-color: rgb(238, 238,
238);">Δ</th>
</tr>
</tbody><tbody class="searchable">
<tr>
<td class="benchmark-name" style="padding: 5px 5px 5px 8px;"><a
href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.15=3"
style="color: rgb(0, 136, 204); text-decoration: none;">lnt.MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan</a></td>
<td style="padding: 5px 5px 5px 8px; background-color: rgb(75,
255, 75);">-23.07%</td>
</tr>
<tr>
<td class="benchmark-name" style="padding: 5px 5px 5px 8px;"><a
href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.40=3"
style="color: rgb(0, 136, 204); text-decoration: none;">lnt.SingleSource/Benchmarks/Shootout/sieve</a></td>
<td style="padding: 5px 5px 5px 8px; background-color:
rgb(130, 255, 130);">-9.50%</td>
</tr>
<tr>
<td class="benchmark-name" style="padding: 5px 5px 5px 8px;"><a
href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.9=3"
style="color: rgb(0, 136, 204); text-decoration: none;">lnt.SingleSource/Benchmarks/BenchmarkGame/nsieve-bits</a></td>
<td style="padding: 5px 5px 5px 8px; background-color:
rgb(144, 255, 144);">-7.26%</td>
</tr>
<tr>
<td class="benchmark-name" style="padding: 5px 5px 5px 8px;"><a
href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.316=3"
style="color: rgb(0, 136, 204); text-decoration: none;">lnt.SingleSource/Benchmarks/BenchmarkGame/recursive</a></td>
<td style="padding: 5px 5px 5px 8px; background-color:
rgb(176, 255, 176);">-3.42%</td>
</tr>
<tr>
<td class="benchmark-name" style="padding: 5px 5px 5px 8px;"><a
href="http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.235=3"
style="color: rgb(0, 136, 204); text-decoration: none;">spec.cpu2006.ref.433_milc</a></td>
<td style="padding: 5px 5px 5px 8px; background-color:
rgb(208, 255, 208);">-1.12%</td>
</tr>
</tbody>
</table>
<br class="Apple-interchange-newline">
While there are a few big jumps in the test-suite, I think the
regressions show this is not<br>
uniformely an improvement for performance.<br>
<br>
Thanks,<br>
<br>
Kristof<br>
<br>
<div class="moz-cite-prefix">On 11/12/2015 07:43, Junmo Park via
llvm-commits wrote:<br>
</div>
<blockquote
cite="mid:fc3056832c9ccc3fee3cb37ad037e62a@localhost.localdomain"
type="cite">
<pre wrap="">flyingforyou added a comment.
Thanks Zhaoshi.
I've just run a bunch of benchmarking including test-suite on Juno(Cortex-A57), there were many improvements and some regressions.
The performance results of test-suite show 1.33% improvement and incur 0.78% regression.
To compute composite benchmark result value, geometric mean is used.
Actually I found some regression after merging r234846.
url: <a class="moz-txt-link-freetext" href="http://reviews.llvm.org/D8994">http://reviews.llvm.org/D8994</a>
After this commit merged, @hfinkel upload new commit r237947.
</pre>
<blockquote type="cite">
<pre wrap="">On X86 (and similar OOO cores) unrolling is very limited, and even if the runtime unrolling is otherwise profitable, the expense of a division to compute the trip count could greatly outweigh the benefits. On the A2, we unroll a lot, and the benefits of unrolling are more significant (seeing a 5x or 6x speedup is not uncommon), so we're more able to tolerate the expense, on average, of adivision to compute the trip count.
</pre>
</blockquote>
<pre wrap="">
I totally agree with this comment. Most of AArch64 Cores support h/w divider including floating point. So I think we can have unrolling oppotunity more.
<a class="moz-txt-link-freetext" href="http://reviews.llvm.org/D15408">http://reviews.llvm.org/D15408</a>
_______________________________________________
llvm-commits mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a>
</pre>
</blockquote>
<br>
</body>
</html>