[LLVMdev] NEON vector instructions and the fast math IR flags

Tobias Grosser grosser at google.com
Thu Jun 6 18:35:01 PDT 2013


Hi,

I was recently looking into the translation of LLVM-IR vector instructions
to ARM NEON assembly. Specifically, when this is legal to do and when we
need to be careful.

I attached a very simple test case:

define <4 x float> @fooP(<4 x float> %A, <4 x float> %B)
{
%C = fmul <4 x float> %A, %B
ret <4 x float> %C
}

If fooP is compiled with  "llc -march=arm -mattr=+vfp3,+neon" LLVM happily
uses ARM NEON instructions to implement the vector multiply. This is
obviously the fastest code that we can generate, but on the other hand we
loose precision compared to non-NEON code (NEON flushes denormals to zero).

As LLVM has now support for IR level fast-math flags, I am wondering if it
now would make sense to only create NEON instructions if the relevant fast
math flags are set on the IR level?

The reason behind my question is that at the moment the only way to get
IEEE 754 floating point operations on ARM is to fully disable NEON.
However, NEON can be safely used for integer computations as well as for
LLVM-IR instructions with the appropriate fast math flags. The attached
test case contains an example of a floating point operation that requires
IEEE 754 compliance, a floating point operation that does not require IEEE
754 as well as an integer computation. It is a perfect mixed use case,
where we really do not want to globally disable NEON.

I understand that some users do not require 754 compliant floating point
behavior (clang on darwin?), which means they would probably not need this
change. However, it should also not hurt them performance-wise as such
users would probably set the relevant global fast-math flags to reduce the
precision requirements, such that NEON instructions would be chosen anyway.

I am very interested in opinions on the general topic as well as how to
actually implement this in the ARM target.

All the best,
Tobias

[1] http://llvm.org/docs/LangRef.html#fast-math-flags
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130606/f5262e8e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: neon-floating-point-precision.ll
Type: application/octet-stream
Size: 1188 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130606/f5262e8e/attachment.obj>


More information about the llvm-dev mailing list