[llvm-dev] NEON FP flags

Tue Mar 29 03:09:05 PDT 2016

On Fri, Mar 25, 2016 at 01:23:03PM +0000, Renato Golin via llvm-dev wrote:
> On 25 March 2016 at 04:11, Hal Finkel <hfinkel at anl.gov> wrote:
> > As I understand it, the fundamental property being addresses here is: Are
> > the semantics of scalar FP math the same as vector FP math? TTI seems like
> > a good place to expose that information. If the semantics are indeed
> > different, then the vectorizer would require fast-math flags in order to
> > vectorize FP operations (similarly, gcc's man page says it requires
> > -funsafe-math-optimizations for vectorization unless -mfpu=neon or similar
> > is specified). In this context, this different-semantics query would return
> > true if:
> 
> The semantics is indeed different, VFP is IEEE-754 compliant while
> NEON is not. We don't want to stop the compiler from using VFP for FP
> math, but we want to be cautious when using NEON in the same way..
> 
> 
> >   !(isDarwin OR ARMISA >= v8 OR fpMath == NEON)
> >
> > and then we need to teach people to use -mfpu=neon ;)
> 
> So, there's the catch. In GCC, -mfpu=neon means to use NEON, which is
> not enabled by default, so the compiler assumes that the user is aware
> that NEON FP is not IEEE compliant. I don't think that's a safe
> assumption, but I also don't want to have a slightly different
> behaviour than GCC gratuitously.

Note that my discussion below relates to the AArch32 behaviour (the ARM
port of GCC, not the AArch64 port of GCC).

I can see why the text in the man page might be misleading, but let me quote
the part I think Hal was referring to here (with added emphasis):

    If the selected floating-point hardware includes the NEON extension
    (e.g. -mfpu=neon), note that floating-point operations are **not**
    generated by GCC's auto-vectorization pass **unless**
    -funsafe-math-optimizations is also specified.  This is because
    NEON hardware does not fully implement the IEEE 754 standard for
    floating-point arithmetic (in particular denormal values are treated
    as zero), so the use of NEON instructions may lead to a loss of
    precision.

That is to say, GCC will only auto-vectorize floationg-point arithmetic
if both -mfpu=neon AND -funsafe-math-optimizations are given. -mfpu=neon
by itself does not imply that it is OK for GCC to generate non-IEEE
compliant code. The default is safe until explicitly told otherwise.

> Clang defaults to -mfpu=neon when we choose -mcpu=cortex-a* or
> -march=armv7a, so our current behaviour is on par with GCC. But I
> think that's a dangerous assumption.

If your current behaviour is to generate unsafe math when -mfpu=neon
is passed, then I agree this is dangerous. Again, this is *NOT* GCC's
behaviour.

> Furthermore, the only alternatives we have at the moment is to either
> use NEON for everything or nothing. It would be good to have an option
> to use NEON for integer arithmetic and VFP for FP if the user requires
> IEEE compliance.

In GCC, this is -mfpu=neon.

> > P.S. Looking at gcc's man page, gcc seems to use -mfpu for ARM and -mfpmath
> > for x86. Do we use -mfpmath for both?
> 
> We already support -mfpmath=vfp/neon in Clang, but it's bogus. My
> proposal is to make it count.
> 
> The best way I can think of is to let -mfpmath=vfp *disable* only FP
> NEON and -mfpmath=neon *enable* only FP NEON, both orthogonal from
> integer math.
> 
> Examples:
> 
> Works today:
> -mfpu=soft -> Int (ALU), FP (LIB), no VFP/NEON instructions
> -mfpu=softfp -> Int (ALU), FP (LIB), VFP/NEON instructions allowed
> -mfpu=vfp -> Int (ALU), FP (VFP)
> -mfpu=neon -> Int (NEON), FP (NEON)
> 
> Change proposed:
> -mfpmath=neon -mfpu=vfp -> Int (ALU), FP (NEON)
> -mfpmath=vfp -mfpu=neon -> Int (NEON), FP (VFP)
> 
> This would be similar enough to GCC, and would allow the small number
> of users that care about IEEE-754 compliance to disable FP NEON on
> demand.

In GCC today:

  -mfpu=vfp is the minimum floating-point instruction set supported, the
    choice of which ABI you use (-mfloat-abi) is independent from the choice
    of floating-point hardware that exists. -mfpu=soft and -mfpu=softfp are
    rejected by GCC.

Starting with that:

  -mfloat-abi=soft -> Generate library calls for all floating-point
    operations, do not permit Neon operations.
  -mfloat-abi=softfp -> Pass floating point arguments using the softfloat
    abi (i.e. in core registers). Emit floating point instructions as
    appropriate.
  -mfloat-abi=hard -> Pass floating point arguments in VFP registers.
    Emit floating point instructions as appropriate.

Independent of this, we have -mfpu:

  -mfpu=neon -> Permit generation of Neon instructions (both integer and
    floating point) where allowed by the language specification. Note that
    this does not by itself allow the generation of non-IEEE compliant code.

And on top of that, -funsafe-math-optimizations to enable generating Neon
instructions for floating point operations.

For your set of use cases:

  Int (ALU), FP (LIB), no VFP/NEON instructions

    -mfloat-abi=soft

  Int (ALU), FP (LIB), VFP/NEON instructions allowed

    Impossible

  Int (ALU), FP (VFP)

    -mfloat-abi=hard or -mfloat-abi=softfp
   + -mfpu=vfp (or other non-neon FPU)

  Int (NEON), FP (VFP)
    -float-abi=hard or -mfloat-abi=softfp
   + -mfpu=neon (or greater)

  Int (NEON), FP (NEON)

    -float-abi=hard or -mfloat-abi=softfp
   + -mfpu=neon (or greater)
   + -funsafe-math-optimizations (or equivalent)

  Int (ALU), FP (NEON)

    Impossible (as far as I know).

Hope this helps,
James