>>> > On May 31, 2012, at 7:22 PM, Lang Hames wrote:
>>> >> Thanks for the suggestion Matthieu. I spoke to Doug and he
>>> recommended using attributes rather than a FunctionDecl bit to represent
>>> the fp_contract state.
>>> >
>>> > Hmm.  I had suggested a bit on FunctionDecl on the assumption that
>>> this would often be controlled globally, maybe by using a flag to control
>>> the default or by activating a #pragma before including all the headers.
>>>  Actually, I could even imagine a target (maybe a GPU target?) even
>>> opting-in to this behavior by default.  If we're going to use an Attr, we
>>> need to make sure it doesn't get added unless the current #pragma state is
>>> different from the global default;  we really don't want to be allocating
>>> an attribute for every function definition in the translation unit.
>>> We want FP_CONTRACT ON to be the default for all targets.  It's also
>>> worth noting that it's critical that we support setting the pragma to OFF,
>>> but in practice this will be exceedingly rare (almost certainly less than
>>> 1% of sources, and probably far less than that).
>> Based on this comment, I'm really not keen on the current representation,
>> but maybe I've mis-understood it, so I'll ask questions first:
>> The 'fmuladd' intrinsic is used to whitelist specific operations for
>> fused multiply+add handling, correct?
>> Correct.
>> If so, and if Stephen's stance is correct (I certainly agree with it!)
>> that this should be allowed for the vast majority of code, that means that
>> almost every fmul and fadd in the current IR should be a candidate for
>> fusing?
>> Only those that originate from a common source-language *expression*.
>>  Your examples should not be fused because the multiply and add are in two
>> separate expressions (which is why we need FE involvement; that information
>> isn't available later).
> Ok, now I'm extra confused. Thanks for replying, hopefully you can help me
> understand better.
> Why would it not be OK to fuse multiplies and adds that occur in two
> source-language expressions? I have some vague memory of Fortran having
> lots of special rules about within-expression semantics versus semantics
> across expressions, but C++ has no such constraints to my knowledge, nor
> would it want them.
> Having these types of artificial source-representation restrictions on
> semantics in C++ undermines specific language constructs like overloaded
> operators and transparent "wrapper" classes.

Trying to at least do my homework, as I'm not usually working w/ numerics,
I've been reading up.

I've now read the FP_CONTRACT part of the C11 spec, and see where your
statement comes from. I find this restriction... mysterious. I would love
to understand why it is important to prevent inlining from exposing
contraction opportunities if you can give any examples.

That said, FP_CONTRACT doesn't apply to C++, and it's quite unlikely to
become a serious part of the standard given these (among other)
limitations. Curiously, in C++11, it may not be needed to get the benefit
of fused multiply-add:

[expr] p11 seems to indicate that in C++, we are almost always allowed to
use increased precision to represent operations. The only exception we can
find in the C++ standard (and thanks to Richard for helping me crawl
through this part) is this:


For any expression 'x' of floating point type, the expression may be
evaluated with extra precision, but the result of round-trip casting it
through a double must not. ;] It's not entirely clear this contortion was
intended[1]. This definition, while awkward and arbitrary, has a nice
property of being able to cleanly represent boundaries of increased
precision allowance w/o regard for inlining or other optimizations.

The state of C++11 makes my (somewhat crazy) idea of a flag a less
attractive representation, as does the C11 contraction specification, but
it still doesn't make me enthused about the default representation becoming
an intrinsic, and forcing the FE to pre-fuse all of these rather than
marking the range of fuse-able operations and allowing the middle end to
perform the fusion. I'm actually beginning to like the start/stop intrinsic
pair to represent the sequences of ineligible operations.


[1] There is a footnote in the latest working draft that indicates
'static_cast<float>(x)' may have been intended to be enough to force the
precision, but the current wording isn't strict enough for that to be the
