[llvm-dev] Vectorization with fast-math on irregular ISA sub-sets
Martin J. O'Riordan via llvm-dev
llvm-dev at lists.llvm.org
Thu Feb 11 03:23:39 PST 2016
Our processor also has some issues regarding the handling of denormals - scalar and vector - and we ran into a related problem only a few days ago.
The v3.8 compiler has done a lot of good work on optimisations for floating-point math, but ironically one of them broke our implementation of 'nextafterf'. The desired code fragment (FP32) is:
float xAbs = fabsf(x);
since we know our instruction for this does not handle denormals and the algorithm is sensitive to correct denormals, the code was written to avoid this issue as follows:
float xAbs = __builtin_astype(__builtin_astype(x, unsigned) & 0x7FFFFFFF, float);
But the v3.8 FP optimiser now recognises this pattern and replaces it with an ISD::FABS node and broke our workaround :-) It's a great optimisation and I have no problem with its correctness, but I was thinking that perhaps I might see where I should extend the target information interface to allow a target to say that it does not support denormals so that this and possibly other optimisations could be suppressed in a target dependent way.
Overall the new FP32 optimisation patterns appear to have yielded a small but not insignificant performance advantage over v3.7.1, though it is still early days for my complete measurements.
From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Renato Golin via llvm-dev
Sent: 11 February 2016 10:53
To: Hal Finkel <hfinkel at anl.gov>
Cc: LLVM Dev <llvm-dev at lists.llvm.org>; nd <nd at arm.com>
Subject: Re: [llvm-dev] Vectorization with fast-math on irregular ISA sub-sets
I had a read on the ARM ARM about VFP and SIMD FP semantics and my analysis is that NEON's only problem is the Flush-to-zero behaviour, which is non-compliant.
NEON deals with NaNs and Infs in the way specified by the standard and should not cause any concern to us. But we don't seem to have a flag specifically to denormals, so I think using the UnsafeMath is the safest option for now.
On 11 February 2016 at 01:15, Hal Finkel <hfinkel at anl.gov> wrote:
> No Signed Zeros - Allow optimizations to treat the sign of a zero argument or result as insignificant.
In both VFP and NEON, zero signs are significant. In NEON, the flush-to-zero's zero will have the same sign as the input denormal.
> No NaNs - Allow optimizations to assume the arguments and result are not NaN. Such optimizations are required to retain defined behavior over NaNs, but the value of the result is undefined.
Both VFP and NEON treat NaNs as the standard requires, ie. [ NaN op ? ] = NaN.
> No Infs - Allow optimizations to assume the arguments and result are not +/-Inf. Such optimizations are required to retain defined behavior over +/-Inf, but the value of the result is undefined.
Same here. Operations with Inf generate Inf or NaNs on both units.
The flush-to-zero behaviour has an effect on both NaNs and Infs, since it happens before. So a denormal operation with an Inf in VFP will not generate a NaN, while in NEON it'll be flushed to zero first, thus generating NaNs.
James, is that a correct assessment?
LLVM Developers mailing list
llvm-dev at lists.llvm.org
More information about the llvm-dev