Thanks all for the feedback.<div><br></div><div>My big take-away from discussing this with Chandler is that I didn't explain the motivation for the existing design well. I'll keep that in mind in future. The reason I like fmuladd as a way to get started on FP_CONTRACT support is simply because it's lightweight, and captures most of the cases that we care about. The heavy lifting for proper FP_CONTRACT support, such as there is, will be teaching the parser how to properly deal with FP_CONTRACT pragmas applied to subexpressions (This is probably a simple task for people who are familiar with clang, but it is new territory for me). Since fmuladd itself is so trivial, it will be easy to replace with a more comprehensive system for tracking fusing opportunities if/when we decide it's called for.</div>

<div><br></div><div>- Lang.</div><div><br></div><div><div class="gmail_quote">On Tue, Jun 5, 2012 at 9:24 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Tue, 05 Jun 2012 20:12:00 -0700<br>

<div><div class="h5">John McCall <<a href="mailto:rjmccall@apple.com">rjmccall@apple.com</a>> wrote:<br>

<br>

> On Jun 5, 2012, at 3:35 PM, John McCall wrote:<br>

> > On Jun 5, 2012, at 3:04 PM, Chandler Carruth wrote:<br>

> >> On Tue, Jun 5, 2012 at 2:58 PM, Stephen Canon <<a href="mailto:scanon@apple.com">scanon@apple.com</a>><br>

> >> wrote: On Jun 5, 2012, at 2:45 PM, John McCall<br>

> >> <<a href="mailto:rjmccall@apple.com">rjmccall@apple.com</a>> wrote:<br>

> >><br>

> >> > On Jun 5, 2012, at 2:15 PM, Stephen Canon wrote:<br>

> >> ><br>

> >> >> On Jun 5, 2012, at 1:51 PM, Chandler Carruth<br>

> >> >> <<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>> wrote:<br>

> >> >><br>

> >> >>> That said, FP_CONTRACT doesn't apply to C++, and it's quite<br>

> >> >>> unlikely to become a serious part of the standard given these<br>

> >> >>> (among other) limitations. Curiously, in C++11, it may not be<br>

> >> >>> needed to get the benefit of fused multiply-add:<br>

> >> >><br>

> >> >> Perversely, a strict reading of C++11 seems (to me) to not<br>

> >> >> allow FMA formation in C++ at all:<br>

> >> >><br>

> >> >>      • The values of the floating operands and the results of<br>

> >> >> floating expressions may be represented in greater precision<br>

> >> >> and range than that required by the type; the types are not<br>

> >> >> changed thereby.<br>

> >> >><br>

> >> >> FMA formation does not increase the precision or range of the<br>

> >> >> result (it may or may not have smaller error, but it is not<br>

> >> >> more precise), so this paragraph doesn't actually license FMA<br>

> >> >> formation.  I can't find anywhere else in the standard that<br>

> >> >> could (though I am *far* less familiar with C++11 than C11, so<br>

> >> >> I may not be looking in the right places).<br>

> >> ><br>

> >> > Correct me if I'm wrong, but I thought that an FMA could be<br>

> >> > formalized as representing the result of the multiply with<br>

> >> > greater precision than the operation's type actually provides,<br>

> >> > and then using that as the operand of the addition.  It's<br>

> >> > understand that that can change the result of the addition in<br>

> >> > ways that aren't just "more precise".  Similarly, performing<br>

> >> > 'float' operations using x87 long doubles can change the result<br>

> >> > of the operation, but I'm pretty sure that the committees<br>

> >> > explicitly had hardware limitations like that in mind when they<br>

> >> > added this language.<br>

> >><br>

> >> That's an interesting point.  I'm inclined to agree with this<br>

> >> interpretation (there are some minor details about whether or not<br>

> >> 0*INF + NAN raises the invalid flag, but let's agree to ignore<br>

> >> that).<br>

> >><br>

> >> I'm not familiar enough with the language used in the C++ spec to<br>

> >> know whether this makes C++ numerics equivalent to STDC<br>

> >> FP_CONTRACT on, or equivalent to "allow greedy FMA formation".<br>

> >> Anyone?<br>

> >><br>

> >> If you agree w/ John's interpretation, and don't consider the flag<br>

> >> case you mention, AFAICT, this allows greedy FMA formation, unless<br>

> >> the intermediate values are round-tripped through a cast construct<br>

> >> such as I described.<br>

> ><br>

> > I'm still not sure why you think this restriction *only* happens<br>

> > when round-tripping through casts, rather than through any thing<br>

> > which is not an operand or result, e.g. an object.<br>

> ><br>

> > Remember that the builtin operators are privileged in C++ — they<br>

> > are not semantically like calls, even in the cases where they're<br>

> > selected by overload resolution.<br>

> ><br>

> > I agree that my interpretation implies that a type which merely<br>

> > wraps a double nonetheless forces stricter behavior.  I also agree<br>

> > that this sucks.<br>

><br>

> To continue this thought, the most straightforward way to represent<br>

> this in IR would be to (1) add a "contractable" bit to the LLVM<br>

> operation (possibly as metadata) and (2) provide an explicit "value<br>

> barrier" instruction (a unary operator preventing contraction<br>

> "across" it).  We would introduce the barrier in the appropriate<br>

> circumstances, i.e. an explicit cast, a load from a variable, or<br>

> whatever else we conclude requires these semantics.  It would then be<br>

> straightforward to produce FMAs from this, as well as just generally<br>

> avoiding rounding when the doing sequences of illegal FP ops.<br>

> -ffast-math would imply never inserting the barriers.<br>

><br>

> The disadvantages I see are:<br>

>   - there might be lots of peepholes and isel patterns that would<br>

> need to be taught to to look through a value barrier<br>

>   - the polarity of barriers is wrong, because code that lacks<br>

> barriers is implicitly opting in to things, so e.g. LTO could pick a<br>

> weak_odr function from an old tunit that lacks a barrier which a<br>

> fresh compile would insist on.<br>

<br>

</div></div>I don't like the barrier approach because it implies that the FE must<br>

serialize each C expression as a distinct group of LLVM instructions.<br>

While it may be true that this currently happens in practice, I don't<br>

think we want to force it to be this way.<br>

<br>

Given the unique nature of this restriction, I think that the best way<br>

to do this is to model it directly: add metadata, or some instruction<br>

attribute, to each floating-point instruction indicating its<br>

'contraction domain' (some module-unique integer will work). Only<br>

instructions with the same contraction domain can be contracted.<br>

Instructions without a contraction domain cannot be contracted. I<br>

realize that this is verbose, but realistically, the only way to tell<br>

LLVM what instructions are part of which C-language expression is to<br>

tag each relevant instruction.<br>

<br>

 -Hal<br>

<br>

><br>

> John.<br>

<span class="HOEnZb"><font color="#888888"><br>

<br>

--<br>

Hal Finkel<br>

Postdoctoral Appointee<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

_______________________________________________<br>

cfe-commits mailing list<br>

<a href="mailto:cfe-commits@cs.uiuc.edu">cfe-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits</a><br>

</div></div></blockquote></div><br></div>