[llvm-commits] LLVM patch to support ARM fused multiply add/subtract instructions

Wed Jan 25 07:27:16 PST 2012

On Jan 25, 2012, at 10:07 AM, Hal Finkel wrote:

> On Wed, 2012-01-25 at 17:42 +0400, Anton Korobeynikov wrote:
>> Hi Ana,
>> 
>>> In this update:
>>> - I assumed neon2 does not imply vfpv4, but neon and vfpv4 imply neon2.
>>> - I kept setting .fpu=neon-vfpv4 code attribute because that is what the
>>> assembler understands.
>> Looks ok.
>> 
>>> The additional changes mentioned in the email discussions I think belong to
>>> a separate patch:
>>> - Associate VMLA/VMLS with LessPreciseFPMAD flag, and maybe with fast-math
>>> flag.
>> They should definitely not be. They are not less precise! They are
>> "exactly precise" as two separate ops. It's just FMA which has greater
>> precision than usual thanks to 1 rounding.
>> And it's FMA which needs to be associated with -ffast-math on VFPv2
> 
> Just to be clear, are you advocating associating this with UnsafeFPMath
> or with !NoExcessFPPrecision? I think that it should be the latter, as
> that is what the PPC backend does (and that seems to match the intent of
> the TargetOptions API authors), but unlike -ffast-math
> (-enable-unsafe-fp-math), this will cause the patterns to be enabled by
> default.

Controlling contracting a*b + c to fma(a,b,c) is a thorny issue.  Such contractions often give more accurate results, but they can also sabotage certain important calculations.  As an example, consider squaring a complex number:

	double complex z = CMPLX(M_PI, M_PI);
	double complex w = z*z;

Let's call the real and imaginary parts of z x and y, respectively.  Then the real part of w is given by:

	double real_w = x*x - y*y;

If evaluated without contraction, x*x and y*y are both rounded to the same value, so the subtraction cancels exactly and produces the correct result.  If contraction is used, then we get something like:

	double real_w = fma(x, x, -y*y);

Since no rounding occurs on the intermediate product x*x, the result is not exactly zero, but is instead the low 53 bits of the exact product.  This sort of effect can introduce nasty asymmetries into certain calculations.  It's fine for them to be enabled by default, but it should be possible to toggle them independent of other numerical controls.  !NoExcessFPPrecision is pretty close to the right idea.  -ffast-math seems wrong.

I should point out that the C standard defines the FP_CONTRACT pragma for exactly this purpose (7.12.2).  Off the top of my head, I'm not sure what other languages have to say on the subject.

- Steve