<html><body>

<p><font size="2" face="sans-serif">Hi Tim,</font><br>

<br>

<font size="2" face="sans-serif">Thanks for the thorough explanation. It makes perfect sense.</font><br>

<br>

<font size="2" face="sans-serif">I was not aware fast-math is supposed to prevent more precision being used than what is in the standard.</font><br>

<br>

<font size="2" face="sans-serif">I came across this issue while looking into the output or different compilers. XL and Microsoft compiler seem</font><br>

<font size="2" face="sans-serif">to have that turned on by default. But I assume that clang follows what gcc does, and have that turned off.</font><br>

<br>

<font size="2" face="sans-serif">Thanks again,</font><br>

<font size="2" face="sans-serif">Samuel</font><br>

<br>

<tt><font size="2">Tim Northover <t.p.northover@gmail.com> wrote on 07/31/2014 09:54:55 AM:<br>

<br>

> From: Tim Northover <t.p.northover@gmail.com></font></tt><br>

<tt><font size="2">> To: Samuel F Antao/Watson/IBM@IBMUS</font></tt><br>

<tt><font size="2">> Cc: "llvmdev@cs.uiuc.edu" <llvmdev@cs.uiuc.edu>, Olivier H <br>

> Sallenave/Watson/IBM@IBMUS</font></tt><br>

<tt><font size="2">> Date: 07/31/2014 09:55 AM</font></tt><br>

<tt><font size="2">> Subject: Re: [LLVMdev] FPOpFusion = Fast and Multiply-and-add combines</font></tt><br>

<tt><font size="2">> <br>

> Hi Samuel,<br>

> <br>

> On 30 July 2014 22:37, Samuel F Antao <sfantao@us.ibm.com> wrote:<br>

> > In the DAGCombiner, during the combination of mul and add/subtract into<br>

> > multiply-and-add/subtract, this option is expected to be Fast in order to<br>

> > enable the combine. This means, that by default no multiply-and-add opcodes<br>

> > are going to be generated. If I understand it correctly, this is undesirable<br>

> > given that multiply-and-add for targets like PPC (I am not sure about all<br>

> > the other targets) does not pose any rounding problem and it can even be<br>

> > more accurate than performing the two operations separately.<br>

> <br>

> That extra precision is actually what we're being very careful to<br>

> avoid unless specifically told we're allowed. It can be just as<br>

> harmful to carefully written floating-point code as dropping precision<br>

> would be.<br>

> <br>

> > Also, in TargetOptions.h I read:<br>

> ><br>

> > Standard, // Only allow fusion of 'blessed' ops (currently just fmuladd)<br>

> ><br>

> > which made me suspect that the check against Fast in the DAGCombiner is not<br>

> > correct.<br>

> <br>

> I think it's OK. In the IR there are 3 different ways to express mul + add:<br>

> <br>

> 1. fmul + fadd. This must not be fused into a single step without<br>

> intermediate rounding (unless we're in Fast mode).<br>

> 2. call @llvm.fmuladd. This *may* be fused or not, depending on<br>

> profitability (unless we're in Strict mode, in which case it's<br>

> separate).<br>

> 3. call @llvm.fma. This must not be split into two operations (unless<br>

> we're in Fast mode).<br>

> <br>

> That middle one is there because C actually allows you to allow &<br>

> disallow contraction within a limited region with "#pragma STDC<br>

> FP_CONTRACT ON". So we need a way to represent the idea that it's not<br>

> usually OK to fuse them (i.e. not Fast mode), but this particular one<br>

> actually is OK.<br>

> <br>

> Cheers.<br>

> <br>

> Tim.<br>

> <br>

</font></tt></body></html>