<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

</head>

<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">

Hi Evandro,

<div class=""><br class="">

</div>

<div class="">For now, I'm only looking at AArch32, not AArch64.</div>

<div class="">Indeed, we could also perform in-order scheduling for -mcpu=generic on AArch64. Cortex-A53 indeed seems to be the best/only choice available.</div>

<div class="">But before making that change, that'll require another round of lots of benchmarking.</div>

<div class=""><br class="">

</div>

<div class="">So in summary: I'll put the idea on my backlog, but I probably won't have time to get all the benchmarking done in the very near future.</div>

<div class=""><br class="">

</div>

<div class="">Thanks,</div>

<div class=""><br class="">

</div>

<div class="">Kristof</div>

<div class=""><br class="">

<div>

<blockquote type="cite" class="">

<div class="">On 1 Jun 2017, at 22:23, Evandro Menezes <<a href="mailto:e.menezes@samsung.com" class="">e.menezes@samsung.com</a>> wrote:</div>

<br class="Apple-interchange-newline">

<div class=""><span style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">Hi,

 Kristof.</span><br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">

<br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">

<span style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">It

 sounds like a good plan, but one thing is not clear to me from your<span class="Apple-converted-space"> </span></span><br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">

<span style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">post.

  Which pipeline model will be used for AArch64, A53's (i.e., none)?</span><br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">

<br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">

<span style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">Thank

 you,</span><br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">

<br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">

<span style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">--<span class="Apple-converted-space"> </span></span><br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">

<span style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">Evandro

 Menezes</span><br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">

<br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">

<span style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">On

 06/01/2017 01:37 AM, Kristof Beyls wrote:</span><br style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">

<blockquote type="cite" style="font-family: Menlo-Regular; font-size: 11px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">

Thanks for everyone giving their feedback!<br class="">

I saw pretty unanimous support for making -mcpu=generic the default<span class="Apple-converted-space"> </span><br class="">

and making -mcpu=generic schedule for an in-order CPU (Cortex-A8 in<span class="Apple-converted-space"> </span><br class="">

this case).<br class="">

I'll be making those changes shortly.<br class="">

<br class="">

I think the comments also make clear that it's less obvious whether<span class="Apple-converted-space"> </span><br class="">

we'd want -mcpu=native to become a default. It's probably good for<span class="Apple-converted-space"> </span><br class="">

some use cases, but really not good for other use cases. I won't be<span class="Apple-converted-space"> </span><br class="">

making that change, nor advocate for it.<br class="">

<br class="">

Thanks!<br class="">

<br class="">

Kristof<br class="">

<br class="">

<br class="">

<blockquote type="cite" class="">On 31 May 2017, at 17:57, Stephen Hines <<a href="mailto:srhines@google.com" class="">srhines@google.com</a><span class="Apple-converted-space"> </span><br class="">

<<a href="mailto:srhines@google.com" class="">mailto:srhines@google.com</a>>> wrote:<br class="">

<br class="">

Wow, these are some fantastic results! Android is definitely in favor<span class="Apple-converted-space"> </span><br class="">

of fixing the defaults, so this proposal looks great from our<span class="Apple-converted-space"> </span><br class="">

perspective.<br class="">

<br class="">

Thanks,<br class="">

Steve<br class="">

<br class="">

On Wed, May 31, 2017 at 5:57 AM, Kristof Beyls <<a href="mailto:Kristof.Beyls@arm.com" class="">Kristof.Beyls@arm.com</a><span class="Apple-converted-space"> </span><br class="">

<<a href="mailto:Kristof.Beyls@arm.com" class="">mailto:Kristof.Beyls@arm.com</a>>> wrote:<br class="">

<br class="">

   *Motivation*<br class="">

<br class="">

   At the moment, when targeting armv7a, clang defaults to generate<br class="">

   code as if -mcpu=cortex-a8 was specified.<br class="">

   When targeting armv8a, it defaults to generate code as if<br class="">

   -mcpu=cortex-a53 was specified.<br class="">

<br class="">

   This leads to surprising code generation, by the compiler<br class="">

   optimizing for a specific micro-architecture, whereas the intent<br class="">

   from the user was probably to generate code that is "blended" for<br class="">

   all the cores implementing the requested architecture. One<br class="">

   example of a user being surprised like this is at<br class="">

   <a href="https://bugs.llvm.org//show_bug.cgi?id=27219" class="">https://bugs.llvm.org//show_bug.cgi?id=27219</a><br class="">

   <<a href="https://bugs.llvm.org//show_bug.cgi?id=27219" class="">https://bugs.llvm.org//show_bug.cgi?id=27219</a>>, where vmla's are<br class="">

   not produced to optimize for a Cortex-A8-specific<br class="">

   micro-architectural behaviour, even though the user didn't<br class="">

   request to optimize specifically for Cortex-A8.<br class="">

<br class="">

   It would be much cleaner conceptually if clang would default to<br class="">

   -mcpu=generic when no specific cpu is specified.<br class="">

<br class="">

   *What is the impact of this change on execution speed?*<br class="">

   *<br class="">

   *<br class="">

   I think the main reason to be hesitant to change the default CPU<br class="">

   for ARM to -mcpu=generic is the potential impact on performance<br class="">

   of generated code.<br class="">

   *<br class="">

   *<br class="">

   I've measured quite a wide selection of benchmarks with this<br class="">

   change, on the following cores: Cortex-A9, Cortex-A53,<br class="">

   Cortex-A57, Cortex-A72.<br class="">

<br class="">

   Impact on execution speed, for each core, when using<br class="">

   -march=armv7a, after changing the default cpu from cortex-a8 to<br class="">

   generic is as follows.<br class="">

   A positive numbers means speedup, a negative number means<br class="">

   slow-down. These are the geomean results over 350 programs coming<br class="">

   from benchmark suites such as the test-suite, SPEC2000, SPEC2006<br class="">

   and a range of proprietary suites.<br class="">

<br class="">

   Cortex-A9: 0.96%<br class="">

   Cortex-A53: -0.64%<br class="">

   Cortex-A57: 1.04%<br class="">

   Cortex-A72: 1.17%<br class="">

<br class="">

   Impact on execution speed, for each core, when using<br class="">

   -march=armv8a, after changing the default cpu from cortex-a53 to<br class="">

   generic:<br class="">

<br class="">

   (Cortex-A9 is an armv7a core, so can't execute armv8a binaries)<br class="">

   Cortex-A53: -0.09%<br class="">

   Cortex-A57: -0.12%<br class="">

   Cortex-A72: 0.03%<br class="">

<br class="">

   *Should we enable scheduling for an in-order core even for<br class="">

   -mcpu=generic?*<br class="">

   *<br class="">

   *<br class="">

   In the above measurements it shows that the biggest negative<br class="">

   impact seen is with -march=armv7a on Cortex-A53: -0.64%.<br class="">

   It seems that the in-order Cortex-A53 core is losing quite a bit<br class="">

   of performance when the instructions aren't scheduled - which is<br class="">

   to be expected.<br class="">

   Therefore, I also experimented with letting instructions be<br class="">

   scheduled according to the Cortex-A8 pipeline model, even for<br class="">

   -mcpu=generic, trying to figure out if it's beneficial to<br class="">

   schedule instructions for an in-order core rather than not trying<br class="">

   to schedule them at all, for -mcpu=generic.<br class="">

<br class="">

   Measurement results:<br class="">

<br class="">

   -march=armv7a<br class="">

<br class="">

   Cortex-A9: 1.57% (up from 0.96%)<br class="">

   Cortex-A53: 0.47% (up from -0.64%)<br class="">

   Cortex-A57: 1.74% (up from 1.04%)<br class="">

   Cortex-A72: 1.72% (up from 1.17%)<br class="">

<br class="">

   -march=armv8a (Note that there isn't a pipeline model for<br class="">

   Cortex-A53 in the 32-bit ARM backend):<br class="">

<br class="">

   (Cortex-A9 is an armv7a core, so can't execute armv8a binaries)<br class="">

   Cortex-A53: 0.49% (up from -0.09%)<br class="">

   Cortex-A57: 0.09% (up from -0.12%)<br class="">

   Cortex-A72: 0.20% (up from 0.03%)<br class="">

<br class="">

   Conclusion: for all the in-order and out-of-order cores I<br class="">

   measured, it's beneficial to get the instructions scheduled using<br class="">

   the Cortex-A8 pipeline model in combination with -mcpu=generic.<br class="">

<br class="">

<br class="">

   Taking into account the above measurements, my conclusions are:<br class="">

   1. We should make -mcpu=generic the default cpu, not Cortex-A8 or<br class="">

   Cortex-A53 for march=armv7a and march=armv8a.<br class="">

   2. We probably want to let the compiler schedule instructions<br class="">

   using the Cortex-A8 pipeline model for -mcpu=generic, since it<br class="">

   gives a bit of speedup on all cores tested.<br class="">

<br class="">

   Do people agree with these conclusions?<br class="">

   Any objections against implementing this?<br class="">

   Any other potential impact this may have that I forgot to<br class="">

   consider above?<br class="">

<br class="">

   Thanks,<br class="">

<br class="">

   Kristof</blockquote>

</blockquote>

</div>

</blockquote>

</div>

<br class="">

</div>

</body>

</html>