<div class="gmail_quote">On Tue, Jun 5, 2012 at 1:18 PM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="gmail_quote"><div><div class="h5">On Tue, Jun 5, 2012 at 1:15 PM, Stephen Canon <span dir="ltr"><<a href="mailto:scanon@apple.com" target="_blank">scanon@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div style="word-wrap:break-word"><div><div>On Jun 5, 2012, at 1:08 PM, Chandler Carruth <<a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>> wrote:</div></div><div><div>

<br><blockquote type="cite"><div class="gmail_quote">Hey Lang,</div><div class="gmail_quote"><br></div><div class="gmail_quote">Sorry to jump in late, but was catching on up email and finally read through this thread. This is the exchange that caught my interest:</div>


<div class="gmail_quote"><br></div><div class="gmail_quote">On Fri, Jun 1, 2012 at 4:50 AM, Stephen Canon <span dir="ltr"><<a href="mailto:scanon@apple.com" target="_blank">scanon@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>On May 31, 2012, at 10:40 PM, John McCall <<a href="mailto:rjmccall@apple.com" target="_blank">rjmccall@apple.com</a>> wrote:<br>

<br>

> On May 31, 2012, at 7:22 PM, Lang Hames wrote:<br>

>> Thanks for the suggestion Matthieu. I spoke to Doug and he recommended using attributes rather than a FunctionDecl bit to represent the fp_contract state.<br>

><br>

> Hmm.  I had suggested a bit on FunctionDecl on the assumption that this would often be controlled globally, maybe by using a flag to control the default or by activating a #pragma before including all the headers.  Actually, I could even imagine a target (maybe a GPU target?) even opting-in to this behavior by default.  If we're going to use an Attr, we need to make sure it doesn't get added unless the current #pragma state is different from the global default;  we really don't want to be allocating an attribute for every function definition in the translation unit.<br>


<br>

</div>We want FP_CONTRACT ON to be the default for all targets.  It's also worth noting that it's critical that we support setting the pragma to OFF, but in practice this will be exceedingly rare (almost certainly less than 1% of sources, and probably far less than that).<br>


</blockquote><div><br></div><div>Based on this comment, I'm really not keen on the current representation, but maybe I've mis-understood it, so I'll ask questions first:</div><div><br></div><div>The 'fmuladd' intrinsic is used to whitelist specific operations for fused multiply+add handling, correct?</div>


</div></blockquote><div><br></div></div><div>Correct.</div><div><br><blockquote type="cite"><div class="gmail_quote">

<div>If so, and if Stephen's stance is correct (I certainly agree with it!) that this should be allowed for the vast majority of code, that means that almost every fmul and fadd in the current IR should be a candidate for fusing?</div>


</div></blockquote><div><br></div></div><div>Only those that originate from a common source-language *expression*.  Your examples should not be fused because the multiply and add are in two separate expressions (which is why we need FE involvement; that information isn't available later).</div>


</div></div></blockquote><div><br></div></div></div><div>Ok, now I'm extra confused. Thanks for replying, hopefully you can help me understand better.</div><div><br></div><div>Why would it not be OK to fuse multiplies and adds that occur in two source-language expressions? I have some vague memory of Fortran having lots of special rules about within-expression semantics versus semantics across expressions, but C++ has no such constraints to my knowledge, nor would it want them.</div>


<div><br></div><div>Having these types of artificial source-representation restrictions on semantics in C++ undermines specific language constructs like overloaded operators and transparent "wrapper" classes.</div>

</div></blockquote><div><br></div><div>Trying to at least do my homework, as I'm not usually working w/ numerics, I've been reading up.</div><div><br></div><div>I've now read the FP_CONTRACT part of the C11 spec, and see where your statement comes from. I find this restriction... mysterious. I would love to understand why it is important to prevent inlining from exposing contraction opportunities if you can give any examples.</div>

<div><br></div><div><br></div><div>That said, FP_CONTRACT doesn't apply to C++, and it's quite unlikely to become a serious part of the standard given these (among other) limitations. Curiously, in C++11, it may not be needed to get the benefit of fused multiply-add:</div>

<div><br></div><div>[expr] p11 seems to indicate that in C++, we are almost always allowed to use increased precision to represent operations. The only exception we can find in the C++ standard (and thanks to Richard for helping me crawl through this part) is this:</div>

<div><br></div><div>  static_cast<float>(static_cast<double>(x))</div><div><br></div><div>For any expression 'x' of floating point type, the expression may be evaluated with extra precision, but the result of round-trip casting it through a double must not. ;] It's not entirely clear this contortion was intended[1]. This definition, while awkward and arbitrary, has a nice property of being able to cleanly represent boundaries of increased precision allowance w/o regard for inlining or other optimizations.</div>

<div><br></div><div>The state of C++11 makes my (somewhat crazy) idea of a flag a less attractive representation, as does the C11 contraction specification, but it still doesn't make me enthused about the default representation becoming an intrinsic, and forcing the FE to pre-fuse all of these rather than marking the range of fuse-able operations and allowing the middle end to perform the fusion. I'm actually beginning to like the start/stop intrinsic pair to represent the sequences of ineligible operations.</div>

<div><br></div><div>-Chandler</div><div><br></div><div>[1] There is a footnote in the latest working draft that indicates 'static_cast<float>(x)' may have been intended to be enough to force the precision, but the current wording isn't strict enough for that to be the case.</div>

</div>