[LLVMdev] NEON vector instructions and the fast math IR flags

Fri Jun 7 01:17:15 PDT 2013

> |I just looked again at the +neonfp flag. Compiling with and without
> |+neonfp flag seems to only affect scalar types in the attached test
> |case. If e.g. the LLVM vectorizer introduces vector instructions on
> |LLVM-IR level floating point vectors still yield NEON assembly even if
> |compiled with "-mattr=+neon,-neonfp". Is this expected?
>
> I'm virtually certain that's a problem since there are codebases out there
> which use that to effectively specify "integer neon but use VFP for floats".
> If the vectorizer is producing neon floating point from scalar code
> in the presence of that flag then it's a (minor) issue waiting to happen.

That flag doesn't really do what it is described as doing. It specifies that NEON instructions should be used for *scalar* arithmetic. It tries to avoid using VFP instructions and will promote scalar ops to vector ops. This is to try and gain performance when switching between VFP and NEON pipelines is punished by a core.

Also, -neonfp does nothing. It is not a ternary flag (do nothing, force-on, force-off) - it is either active or inactive. +neonfp forces some transformation, -neonfp disables forcing that transformation. -neonfp doesn't imply any transformations itself.

-----Original Message-----
From: David Tweed
Sent: 07 June 2013 09:01
To: 'Tobias Grosser'; Renato Golin
Cc: LLVMdev at cs.uiuc.edu; Tobias Grosser; James Molloy; Silviu Baranga
Subject: RE: [LLVMdev] NEON vector instructions and the fast math IR flags

>> Darwin uses NEON for floating point, but does *not* (and should not).
>> globally enable fast math flags.  Use of NEON for FP needs to remain
>> achievable without globally setting the fast math flags.  Fast math may
>> imply reasonably imply NEON, but the opposite direction is not accurate.

| Good point. Fast math is probably a too tough requirement. I need to
| look into what are the ways NEON does not comply with IEEE 754. For now
| the only difference I see is that it may round denormals to zero.

Yes, I've gone on record before as saying that fast-math enables far too
many
different things for it to be "the canonical switch" for just about any
transformation.
Rather, it should be what I think it is in gcc which is an effectively a
short-cut
for invoking of several individual math-option flags.

[snip]

|I just looked again at the +neonfp flag. Compiling with and without
|+neonfp flag seems to only affect scalar types in the attached test
|case. If e.g. the LLVM vectorizer introduces vector instructions on
|LLVM-IR level floating point vectors still yield NEON assembly even if
|compiled with "-mattr=+neon,-neonfp". Is this expected?

I'm virtually certain that's a problem since there are codebases out there
which use that to effectively specify "integer neon but use VFP for floats".
If the vectorizer is producing neon floating point from scalar code
in the presence of that flag then it's a (minor) issue waiting to happen.

Cheers,
Dave

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.