<div dir="ltr">Hal, Tim,<div><br></div><div>Thanks for the thorough explanation. That is very clarifying. </div><div><br></div><div>Thanks again!</div><div>Samuel</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">
2014-08-10 15:30 GMT-04:00 Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="">----- Original Message -----<br>
> From: "Tim Northover" <<a href="mailto:t.p.northover@gmail.com">t.p.northover@gmail.com</a>><br>
</div><div class="">> To: "Samuel F Antao" <<a href="mailto:sfantao@us.ibm.com">sfantao@us.ibm.com</a>><br>
> Cc: "Olivier H Sallenave" <<a href="mailto:ohsallen@us.ibm.com">ohsallen@us.ibm.com</a>>, <a href="mailto:llvmdev@cs.uiuc.edu">llvmdev@cs.uiuc.edu</a><br>
> Sent: Wednesday, August 6, 2014 10:59:43 PM<br>
> Subject: Re: [LLVMdev] FPOpFusion = Fast and Multiply-and-add combines<br>
><br>
</div><div class="">> > "Each of the computational operations that return a numeric result<br>
> > specified<br>
> > by this standard shall be performed as if it first produced an<br>
> > intermediate<br>
> > result correct to infinite precision and with unbounded range, and<br>
> > then<br>
> > rounded that intermediate result, ..."<br>
> ><br>
> > which perfectly fits what the muladd instructions in PPC and also<br>
> > in avx2<br>
> > are doing: using infinite precision after the multiply.<br>
><br>
> There are two operations in "a + b * c". Using muladd omits the<br>
> second<br>
> requirement ("and then rounded that intermediate result") on the<br>
> first.<br>
><br>
> IEEE describes a completely separate "fusedMultiplyAdd" operation<br>
> with<br>
> the "muladd" semantics.<br>
<br>
</div>Samuel,<br>
<br>
To add to Tim's (correct) response...<br>
<br>
C11, for example, addresses this: Section 6.5 paragraph 8 says, " A floating expression may be contracted, that is, evaluated as though it were a single<br>
operation, thereby omitting rounding errors implied by the source code and the<br>
expression evaluation method. The FP_CONTRACT pragma in <math.h> provides a<br>
way to disallow contracted expressions." The 7.12.2 says, "The default state (‘‘on’’ or ‘‘off’’) for the pragma is<br>
implementation-defined."<br>
<br>
There are a few implications here, the most important being that C allows contraction only within floating-point expressions, but not across statement boundaries. This immediately imposes great challenges to performing mul+add fusion late in the optimizer (in the SelectionDAG, for example), because all notion of source-level statement boundaries have been lost. Furthermore, the granularity of the effects of the FP_CONTRACT pragma are defined in terms of source-level constructs (in 7.12.2).<br>
<br>
Many compilers, including GCC on PowerPC, use a non-standard-compliant mode by default. GCC's manual documents:<br>
<br>
[from GCC man page]<br>
-ffp-contract=style<br>
-ffp-contract=off disables floating-point expression contraction. -ffp-contract=fast enables floating-point expression<br>
contraction such as forming of fused multiply-add operations if the target has native support for them.<br>
-ffp-contract=on enables floating-point expression contraction if allowed by the language standard. This is currently<br>
not implemented and treated equal to -ffp-contract=off.<br>
<br>
The default is -ffp-contract=fast.<br>
[end from GCC man page]<br>
<br>
Clang, however, chooses to provide standard compliance by default. When -ffp-contract=fast is provided, we enable aggressive fusion in DAGCombine. We also enable this whenever fast-math is enabled. When -ffp-contract=on is in effect, we form contractions only where allowed (within expressions). This is done by having Clang itself emit the @llvm.fmuladd intrinsic. We use -ffp-contract=off by default. The benefit of this is that programs compiled with Clang should produce stable answers, as dictated by the relevant standard, across different platforms.<br>
<br>
On PowerPC, LLVM's test-suite uses -ffp-contract=off so that the output is stable against optimizer fusion decisions across multiple compilers.<br>
<br>
Finally, although counter-intuitive, extra precision is not always a good thing. Many numerical algorithms function correctly only in the presence of unbiased rounding that provides symmetric error cancellation across various expressions. If some of those expressions are computed with different amounts of effective precision, these errors don't cancel as they should, and the resulting program can produce inferior answers. Admittedly, I believe such situations are relatively rare, but do certainly exist in thoughtfully-constructed production code.<br>
<br>
-Hal<br>
<div class="im HOEnZb"><br>
><br>
> Cheers.<br>
><br>
> Tim.<br>
> _______________________________________________<br>
> LLVM Developers mailing list<br>
> <a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
><br>
<br>
</div><span class="HOEnZb"><font color="#888888">--<br>
Hal Finkel<br>
Assistant Computational Scientist<br>
Leadership Computing Facility<br>
Argonne National Laboratory<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
</div></div></blockquote></div><br></div>