[LLVMdev] ARM Qualification

Wed Oct 12 06:03:49 PDT 2011

On Tue, Oct 11, 2011 at 05:20:43PM -0700, Owen Anderson wrote:
> 
> On Oct 11, 2011, at 4:48 PM, Joerg Sonnenberger wrote:
> > As I see it, there are regulary commits that introduce performance and
> > code size regressions. There doesn't seem to be any formal testing in
> > place. Not for X86, not for ARM. Hunting down regressions like
> > enable-iv-rewrite=false, which added 130 Bytes to a piece of code that
> > can only be 8KB large in total is painful and slow. From my point of
> > view, the only way to ensure that the compiler does a good job is
> > providing a test infrastructure to monitor this. This is about forcing

                                                             ^^^ not

> > pre-commit test, it is about ensuring that the testing is done at all
> > in a timely manner.
> 
> In a world of multiple developers with conflicting priorities, this
> simply isn't realistic.  I know that those 130 bytes are very important
> to those concerned with the NetBSD bootloader, but the patch that added
> them was worth significant performance improvements on important
> benchmarks (see Jack Howarth's posting for 9/6/11, for instance), which
> lots of other developers consider an obviously good tradeoff.

Don't get me wrong, my problem is not the patch by itself. LLVM at the
moment is relatively bad at creating compact code on x86. I'm not sure
what the status is on ARM for that, but there are use cases where it
matters a lot. Boot loaders are one of them. So disabling some
optimisations when using -Os or -Oz is fine.

The bigger issue is that accepting a size/performance trade off here and
another one there and yet another trade off in that corner sums up. It
can get to the point any of the trade offs by itself is fine, but the
total result goes over the CPU instruction cache and completely kills
performance. More importantly, it will happen with completely harmless
looking changes at some point.

> A policy of "never regress anything" is not tenable, because ANY change
> in code generation has the possibility to regress something.  We end up
> in a world where either we never make any forward progress, or where
> developers hoard up trivial improvements they can use to "negate" the
> regressions caused by real development work.  Neither of these is a
> desirable direction. 

This is not what I was asking for. For GCC there are not only build bots
and functional regression tests, but also regular runs of benchmarks
like SPEC etc. Consider it a call for the community to identify useful
real-world test cases to measure:

(1) Changes in the performance of compiled code, both with and without
LTO.

(2) Changes in the size of compiled code, both with and without
explicitly optimising for it.

(3) Changes in compilation time.

I know that for many bigger changes at least (1) and (3) are often
checked. This is about doing a general testing over a long time. When a
regression on one of the metrics occur, it can be evaluated. But that's
a separate discussion, e.g. whether to disable an optimisation for
-Os/-Oz or move it to a higher optimiser level etc.

> The existing modus operandi on X86 and other targets has been that
> there is a core of functionality (what is represented by the LLVM
> regression tests and test-suite) that all developers implicitly agree
> to avoid regressing on set of "blessed" configurations.  We are
> deliberately cautious in expanding the range of functionality that
> cannot be regressed, or on widening the set of configurations (beyond
> those easily accessible to all developers) on which those regressions
> must not occur.  This allows us to improve quality over time without
> preventing forward progress.

As I see it, the current regression test suite is aimed at preventing
bad compilation. It's not that useful to handle the other cases above.
Of course, checking for compile or runtime regressions is a lot harder
to do as they require a reproducable environment. So my request can't
replace the existing tests and it isn't meant to.

I hope I made myself a bit clearer.

Joerg