[PATCH] D17141: [ARM] Adding IEEE-754 SIMD detection to loop vectorizer
Renato Golin via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 23 12:49:09 PST 2016
rengolin added inline comments.
Comment at: include/llvm/Analysis/TargetTransformInfo.h:373
@@ +372,3 @@
+ /// and SLP vectorization without -ffast-math option.
+ bool isSIMDIEEE754() const;
> hfinkel wrote:
> > What does one thing have to do with the other (i.e. what does IEEE floating point have to do with allowing fast-math)? The underlying issue is that, without fast-math, the numeric representation, and the operations on numbers in that representation, should be the same. fast-math allows the use of alternate representations and operations (so long as they're not too different), but also allows reassociation. To allow vectorizing reductions, we need reassociation as well (which is a separate matter from the potential change in operational semantics).
> To pile on a bit, it's not just about SIMD. Darwin uses NEON for scalar floating point as well, rather than VFP.
I need to make this abundantly clear: this is not about reductions. It's not because fast-math is used to free reductions that it has *only* to do with reductions.
The problem is that IEEE 754 states that *any* transformation *has* to have the same semantics as the original intention. This has to do with CSE, strength reduction, vectorization, inlining, etc. So, it's not just because we're not dealing with reductions that we don't care about IEEE compliance.
The flag -ffast-math acts as a collection of flags related to rouding, zeroes, infinites, nans, etc. There are many ways to disable specific guarantees of the IEEE standard, but fast-math disables all of them. This is an exageration, but it's also safer. Reducing it to the most localised flag is an optimisation. Disabling it for the general case is a correctness change.
Because the loop vectorizer in particular only uses SIMD instructions, and this is a change in the loop vectorizer. There will be additional SLP changes, but this particular change is only related to the loop vectorizer. One thing at a time.
It just happens that ARMv7's SIMD is not IEEE compliant, so I need to detect it and avoid vectorization. I'll need to do the same thing on the SLP vectorizer as well, and only allow VFP instructions. The loop vectorizer does not use VFP instructions, so we should be safe.
Other alternatives were discussed (like vectorizing here, but scalarizing later), but that'll lead to bad predictions and likely bad performance and it's not worth the effort.
Darwin can continue to use NEON for scalar FP without an issue, this is *JUST* for the loop vectorizer and will make no difference at all in Darwin.
The newly added flags are just informational. The decisions are left for the passes to do.
More information about the llvm-commits