[llvm-dev] [RFC] Making -mcpu=generic the default for ARM armv7a and arm8a rather than -mcpu=cortex-a8 or -mcpu=cortex-a53

Wed May 31 08:57:28 PDT 2017

Wow, these are some fantastic results! Android is definitely in favor of
fixing the defaults, so this proposal looks great from our perspective.

Thanks,
Steve

On Wed, May 31, 2017 at 5:57 AM, Kristof Beyls <Kristof.Beyls at arm.com>
wrote:

> *Motivation*
>
> At the moment, when targeting armv7a, clang defaults to generate code as
> if -mcpu=cortex-a8 was specified.
> When targeting armv8a, it defaults to generate code as if -mcpu=cortex-a53
> was specified.
>
> This leads to surprising code generation, by the compiler optimizing for a
> specific micro-architecture, whereas the intent from the user was probably
> to generate code that is "blended" for all the cores implementing the
> requested architecture. One example of a user being surprised like this is
> at https://bugs.llvm.org//show_bug.cgi?id=27219, where vmla's are not
> produced to optimize for a Cortex-A8-specific micro-architectural
> behaviour, even though the user didn't request to optimize specifically for
> Cortex-A8.
>
> It would be much cleaner conceptually if clang would default to
> -mcpu=generic when no specific cpu is specified.
>
> *What is the impact of this change on execution speed?*
>
> I think the main reason to be hesitant to change the default CPU for ARM
> to -mcpu=generic is the potential impact on performance of generated code.
>
> I've measured quite a wide selection of benchmarks with this change, on
> the following cores: Cortex-A9, Cortex-A53, Cortex-A57, Cortex-A72.
>
> Impact on execution speed, for each core, when using -march=armv7a, after
> changing the default cpu from cortex-a8 to generic is as follows.
> A positive numbers means speedup, a negative number means slow-down. These
> are the geomean results over 350 programs coming from benchmark suites such
> as the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites.
>
> Cortex-A9: 0.96%
> Cortex-A53: -0.64%
> Cortex-A57: 1.04%
> Cortex-A72: 1.17%
>
> Impact on execution speed, for each core, when using -march=armv8a, after
> changing the default cpu from cortex-a53 to generic:
>
> (Cortex-A9 is an armv7a core, so can't execute armv8a binaries)
> Cortex-A53: -0.09%
> Cortex-A57: -0.12%
> Cortex-A72: 0.03%
>
> *Should we enable scheduling for an in-order core even for -mcpu=generic?*
>
> In the above measurements it shows that the biggest negative impact seen
> is with -march=armv7a on Cortex-A53: -0.64%.
> It seems that the in-order Cortex-A53 core is losing quite a bit of
> performance when the instructions aren't scheduled - which is to be
> expected.
> Therefore, I also experimented with letting instructions be scheduled
> according to the Cortex-A8 pipeline model, even for -mcpu=generic, trying
> to figure out if it's beneficial to schedule instructions for an in-order
> core rather than not trying to schedule them at all, for -mcpu=generic.
>
> Measurement results:
>
> -march=armv7a
>
> Cortex-A9: 1.57% (up from 0.96%)
> Cortex-A53: 0.47% (up from -0.64%)
> Cortex-A57: 1.74% (up from 1.04%)
> Cortex-A72: 1.72% (up from 1.17%)
>
> -march=armv8a (Note that there isn't a pipeline model for Cortex-A53 in
> the 32-bit ARM backend):
>
> (Cortex-A9 is an armv7a core, so can't execute armv8a binaries)
> Cortex-A53: 0.49% (up from -0.09%)
> Cortex-A57: 0.09% (up from -0.12%)
> Cortex-A72: 0.20% (up from 0.03%)
>
> Conclusion: for all the in-order and out-of-order cores I measured, it's
> beneficial to get the instructions scheduled using the Cortex-A8 pipeline
> model in combination with -mcpu=generic.
>
>
> Taking into account the above measurements, my conclusions are:
> 1. We should make -mcpu=generic the default cpu, not Cortex-A8 or
> Cortex-A53 for march=armv7a and march=armv8a.
> 2. We probably want to let the compiler schedule instructions using the
> Cortex-A8 pipeline model for -mcpu=generic, since it gives a bit of speedup
> on all cores tested.
>
> Do people agree with these conclusions?
> Any objections against implementing this?
> Any other potential impact this may have that I forgot to consider above?
>
> Thanks,
>
> Kristof
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170531/e0f0ff0c/attachment.html>