[llvm-dev] NEON FP flags

Fri Apr 1 07:27:08 PDT 2016

On Fri, Apr 01, 2016 at 02:56:53PM +0100, Renato Golin wrote:
> On 29 March 2016 at 11:09, James Greenhalgh <james.greenhalgh at arm.com> wrote:
> > That is to say, GCC will only auto-vectorize floationg-point arithmetic
> > if both -mfpu=neon AND -funsafe-math-optimizations are given. -mfpu=neon
> > by itself does not imply that it is OK for GCC to generate non-IEEE
> > compliant code. The default is safe until explicitly told otherwise.
> 
> Right, that was what I originally though from Hal's bug report, but
> recent emails on the thread confused me.
> 
> I think this is the right behaviour, and I'm glad GCC does it, so we
> can follow the correct approach from start.

Perfect. I think this is sensible.

> >   Int (NEON), FP (VFP)
> >     -float-abi=hard or -mfloat-abi=softfp
> >    + -mfpu=neon (or greater)
> 
> Excellent! This means I can only make -fsubnormal flags count, and all
> will be the same.
> 
> This was my first approach, but Hal convinced me that we may want a
> specific flag that is included by fast/unsafe maths flags. See below.
> 
> 
> >   Int (NEON), FP (NEON)
> >     -float-abi=hard or -mfloat-abi=softfp
> >    + -mfpu=neon (or greater)
> >    + -funsafe-math-optimizations (or equivalent)
> 
> Do you have one specifically for subnormals? -funsafe-math is a bit of
> a big hammer and will enable other (potentially unwanted) behaviour
> from the vectorizer.
> 
> However, -ffast-math / unsafe-math should include subnormal support.

No, we only have the big hammer throughout the ARM back-end to
enable/disable support for the RTL IR that the vectorizer looks for
when pattern matching. That means you also get your reduction loops and
friends potentially changing your IEEE-754 expectations. Something more
fine-grained would be feasible, but there'd be a fair bit of work needed to
upgrade the implementation. In GCC we either take the performance hit or
you use the big hammer.

> >   Int (ALU), FP (NEON)
> >     Impossible (as far as I know).
> 
> Irrelevant, as far as I care. :)

Having read the bug reports (16275/16274?) I realise I should have
mentioned Neon intrinsics in my original mail. These *are* available
with the appropriate -mfpu/-mfloat-abi/-march flags, no matter whether
your have -funsafe-math-optimizations, and always map to their instruction
(the implementation for this is not neat, essentially we have two backend RTL
patterns, one which is always available for intrinsics, one which is
conditionally available for auto-vectorization).

Thanks,
James