[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

Mon Aug 11 08:14:16 PDT 2014

Hal, Tim,

Thanks for the thorough explanation. That is very clarifying.

Thanks again!
Samuel

2014-08-10 15:30 GMT-04:00 Hal Finkel <hfinkel at anl.gov>:

> ----- Original Message -----
> > From: "Tim Northover" <t.p.northover at gmail.com>
> > To: "Samuel F Antao" <sfantao at us.ibm.com>
> > Cc: "Olivier H Sallenave" <ohsallen at us.ibm.com>, llvmdev at cs.uiuc.edu
> > Sent: Wednesday, August 6, 2014 10:59:43 PM
> > Subject: Re: [LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
> >
> > > "Each of the computational operations that return a numeric result
> > > specified
> > > by this standard shall be performed as if it first produced an
> > > intermediate
> > > result correct to infinite precision and with unbounded range, and
> > > then
> > > rounded that intermediate result, ..."
> > >
> > > which perfectly fits what the muladd instructions in PPC and also
> > > in avx2
> > > are doing: using infinite precision after the multiply.
> >
> > There are two operations in "a + b * c". Using muladd omits the
> > second
> > requirement ("and then rounded that intermediate result") on the
> > first.
> >
> > IEEE describes a completely separate "fusedMultiplyAdd" operation
> > with
> > the "muladd" semantics.
>
> Samuel,
>
> To add to Tim's (correct) response...
>
> C11, for example, addresses this: Section 6.5 paragraph 8 says, " A
> floating expression may be contracted, that is, evaluated as though it were
> a single
>  operation, thereby omitting rounding errors implied by the source code
> and the
>  expression evaluation method. The FP_CONTRACT pragma in <math.h> provides
> a
>  way to disallow contracted expressions." The 7.12.2 says, "The default
> state (‘‘on’’ or ‘‘off’’) for the pragma is
>  implementation-defined."
>
> There are a few implications here, the most important being that C allows
> contraction only within floating-point expressions, but not across
> statement boundaries. This immediately imposes great challenges to
> performing mul+add fusion late in the optimizer (in the SelectionDAG, for
> example), because all notion of source-level statement boundaries have been
> lost. Furthermore, the granularity of the effects of the FP_CONTRACT pragma
> are defined in terms of source-level constructs (in 7.12.2).
>
> Many compilers, including GCC on PowerPC, use a non-standard-compliant
> mode by default. GCC's manual documents:
>
> [from GCC man page]
>  -ffp-contract=style
>      -ffp-contract=off disables floating-point expression contraction.
>  -ffp-contract=fast enables floating-point expression
>      contraction such as forming of fused multiply-add operations if the
> target has native support for them.
>      -ffp-contract=on enables floating-point expression contraction if
> allowed by the language standard.  This is currently
>      not implemented and treated equal to -ffp-contract=off.
>
>      The default is -ffp-contract=fast.
> [end from GCC man page]
>
> Clang, however, chooses to provide standard compliance by default. When
> -ffp-contract=fast is provided, we enable aggressive fusion in DAGCombine.
> We also enable this whenever fast-math is enabled. When -ffp-contract=on is
> in effect, we form contractions only where allowed (within expressions).
> This is done by having Clang itself emit the @llvm.fmuladd intrinsic. We
> use -ffp-contract=off by default. The benefit of this is that programs
> compiled with Clang should produce stable answers, as dictated by the
> relevant standard, across different platforms.
>
> On PowerPC, LLVM's test-suite uses -ffp-contract=off so that the output is
> stable against optimizer fusion decisions across multiple compilers.
>
> Finally, although counter-intuitive, extra precision is not always a good
> thing. Many numerical algorithms function correctly only in the presence of
> unbiased rounding that provides symmetric error cancellation across various
> expressions. If some of those expressions are computed with different
> amounts of effective precision, these errors don't cancel as they should,
> and the resulting program can produce inferior answers. Admittedly, I
> believe such situations are relatively rare, but do certainly exist in
> thoughtfully-constructed production code.
>
>  -Hal
>
> >
> > Cheers.
> >
> > Tim.
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140811/713e9e35/attachment.html>