<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Hi Evandro,
<div class=""><br class="">
</div>
<div class="">For now, I'm only looking at AArch32, not AArch64.</div>
<div class="">Indeed, we could also perform in-order scheduling for -mcpu=generic on AArch64. Cortex-A53 indeed seems to be the best/only choice available.</div>
<div class="">But before making that change, that'll require another round of lots of benchmarking.</div>
<div class=""><br class="">
</div>
<div class="">So in summary: I'll put the idea on my backlog, but I probably won't have time to get all the benchmarking done in the very near future.</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class=""><br class="">
</div>
<div class="">Kristof</div>
<div class=""><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On 1 Jun 2017, at 22:23, Evandro Menezes <<a href="mailto:e.menezes@samsung.com" class="">e.menezes@samsung.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class=""><span style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">Hi,
Kristof.</span><br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<span style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">It
sounds like a good plan, but one thing is not clear to me from your<span class="Apple-converted-space"> </span></span><br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<span style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">post.
Which pipeline model will be used for AArch64, A53's (i.e., none)?</span><br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<span style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">Thank
you,</span><br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<span style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">--<span class="Apple-converted-space"> </span></span><br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<span style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">Evandro
Menezes</span><br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<span style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">On
06/01/2017 01:37 AM, Kristof Beyls wrote:</span><br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
<blockquote type="cite" style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">
Thanks for everyone giving their feedback!<br class="">
I saw pretty unanimous support for making -mcpu=generic the default<span class="Apple-converted-space"> </span><br class="">
and making -mcpu=generic schedule for an in-order CPU (Cortex-A8 in<span class="Apple-converted-space"> </span><br class="">
this case).<br class="">
I'll be making those changes shortly.<br class="">
<br class="">
I think the comments also make clear that it's less obvious whether<span class="Apple-converted-space"> </span><br class="">
we'd want -mcpu=native to become a default. It's probably good for<span class="Apple-converted-space"> </span><br class="">
some use cases, but really not good for other use cases. I won't be<span class="Apple-converted-space"> </span><br class="">
making that change, nor advocate for it.<br class="">
<br class="">
Thanks!<br class="">
<br class="">
Kristof<br class="">
<br class="">
<br class="">
<blockquote type="cite" class="">On 31 May 2017, at 17:57, Stephen Hines <<a href="mailto:srhines@google.com" class="">srhines@google.com</a><span class="Apple-converted-space"> </span><br class="">
<<a href="mailto:srhines@google.com" class="">mailto:srhines@google.com</a>>> wrote:<br class="">
<br class="">
Wow, these are some fantastic results! Android is definitely in favor<span class="Apple-converted-space"> </span><br class="">
of fixing the defaults, so this proposal looks great from our<span class="Apple-converted-space"> </span><br class="">
perspective.<br class="">
<br class="">
Thanks,<br class="">
Steve<br class="">
<br class="">
On Wed, May 31, 2017 at 5:57 AM, Kristof Beyls <<a href="mailto:Kristof.Beyls@arm.com" class="">Kristof.Beyls@arm.com</a><span class="Apple-converted-space"> </span><br class="">
<<a href="mailto:Kristof.Beyls@arm.com" class="">mailto:Kristof.Beyls@arm.com</a>>> wrote:<br class="">
<br class="">
*Motivation*<br class="">
<br class="">
At the moment, when targeting armv7a, clang defaults to generate<br class="">
code as if -mcpu=cortex-a8 was specified.<br class="">
When targeting armv8a, it defaults to generate code as if<br class="">
-mcpu=cortex-a53 was specified.<br class="">
<br class="">
This leads to surprising code generation, by the compiler<br class="">
optimizing for a specific micro-architecture, whereas the intent<br class="">
from the user was probably to generate code that is "blended" for<br class="">
all the cores implementing the requested architecture. One<br class="">
example of a user being surprised like this is at<br class="">
<a href="https://bugs.llvm.org//show_bug.cgi?id=27219" class="">https://bugs.llvm.org//show_bug.cgi?id=27219</a><br class="">
<<a href="https://bugs.llvm.org//show_bug.cgi?id=27219" class="">https://bugs.llvm.org//show_bug.cgi?id=27219</a>>, where vmla's are<br class="">
not produced to optimize for a Cortex-A8-specific<br class="">
micro-architectural behaviour, even though the user didn't<br class="">
request to optimize specifically for Cortex-A8.<br class="">
<br class="">
It would be much cleaner conceptually if clang would default to<br class="">
-mcpu=generic when no specific cpu is specified.<br class="">
<br class="">
*What is the impact of this change on execution speed?*<br class="">
*<br class="">
*<br class="">
I think the main reason to be hesitant to change the default CPU<br class="">
for ARM to -mcpu=generic is the potential impact on performance<br class="">
of generated code.<br class="">
*<br class="">
*<br class="">
I've measured quite a wide selection of benchmarks with this<br class="">
change, on the following cores: Cortex-A9, Cortex-A53,<br class="">
Cortex-A57, Cortex-A72.<br class="">
<br class="">
Impact on execution speed, for each core, when using<br class="">
-march=armv7a, after changing the default cpu from cortex-a8 to<br class="">
generic is as follows.<br class="">
A positive numbers means speedup, a negative number means<br class="">
slow-down. These are the geomean results over 350 programs coming<br class="">
from benchmark suites such as the test-suite, SPEC2000, SPEC2006<br class="">
and a range of proprietary suites.<br class="">
<br class="">
Cortex-A9: 0.96%<br class="">
Cortex-A53: -0.64%<br class="">
Cortex-A57: 1.04%<br class="">
Cortex-A72: 1.17%<br class="">
<br class="">
Impact on execution speed, for each core, when using<br class="">
-march=armv8a, after changing the default cpu from cortex-a53 to<br class="">
generic:<br class="">
<br class="">
(Cortex-A9 is an armv7a core, so can't execute armv8a binaries)<br class="">
Cortex-A53: -0.09%<br class="">
Cortex-A57: -0.12%<br class="">
Cortex-A72: 0.03%<br class="">
<br class="">
*Should we enable scheduling for an in-order core even for<br class="">
-mcpu=generic?*<br class="">
*<br class="">
*<br class="">
In the above measurements it shows that the biggest negative<br class="">
impact seen is with -march=armv7a on Cortex-A53: -0.64%.<br class="">
It seems that the in-order Cortex-A53 core is losing quite a bit<br class="">
of performance when the instructions aren't scheduled - which is<br class="">
to be expected.<br class="">
Therefore, I also experimented with letting instructions be<br class="">
scheduled according to the Cortex-A8 pipeline model, even for<br class="">
-mcpu=generic, trying to figure out if it's beneficial to<br class="">
schedule instructions for an in-order core rather than not trying<br class="">
to schedule them at all, for -mcpu=generic.<br class="">
<br class="">
Measurement results:<br class="">
<br class="">
-march=armv7a<br class="">
<br class="">
Cortex-A9: 1.57% (up from 0.96%)<br class="">
Cortex-A53: 0.47% (up from -0.64%)<br class="">
Cortex-A57: 1.74% (up from 1.04%)<br class="">
Cortex-A72: 1.72% (up from 1.17%)<br class="">
<br class="">
-march=armv8a (Note that there isn't a pipeline model for<br class="">
Cortex-A53 in the 32-bit ARM backend):<br class="">
<br class="">
(Cortex-A9 is an armv7a core, so can't execute armv8a binaries)<br class="">
Cortex-A53: 0.49% (up from -0.09%)<br class="">
Cortex-A57: 0.09% (up from -0.12%)<br class="">
Cortex-A72: 0.20% (up from 0.03%)<br class="">
<br class="">
Conclusion: for all the in-order and out-of-order cores I<br class="">
measured, it's beneficial to get the instructions scheduled using<br class="">
the Cortex-A8 pipeline model in combination with -mcpu=generic.<br class="">
<br class="">
<br class="">
Taking into account the above measurements, my conclusions are:<br class="">
1. We should make -mcpu=generic the default cpu, not Cortex-A8 or<br class="">
Cortex-A53 for march=armv7a and march=armv8a.<br class="">
2. We probably want to let the compiler schedule instructions<br class="">
using the Cortex-A8 pipeline model for -mcpu=generic, since it<br class="">
gives a bit of speedup on all cores tested.<br class="">
<br class="">
Do people agree with these conclusions?<br class="">
Any objections against implementing this?<br class="">
Any other potential impact this may have that I forgot to<br class="">
consider above?<br class="">
<br class="">
Thanks,<br class="">
<br class="">
Kristof</blockquote>
</blockquote>
</div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>