[cfe-commits] [llvm-commits] [PATCH] Add llvm.fmuladd intrinsic.

Tue Jun 5 13:51:13 PDT 2012

On Tue, Jun 5, 2012 at 1:18 PM, Chandler Carruth <chandlerc at google.com>wrote:

> On Tue, Jun 5, 2012 at 1:15 PM, Stephen Canon <scanon at apple.com> wrote:
>
>> On Jun 5, 2012, at 1:08 PM, Chandler Carruth <chandlerc at google.com>
>> wrote:
>>
>> Hey Lang,
>>
>> Sorry to jump in late, but was catching on up email and finally read
>> through this thread. This is the exchange that caught my interest:
>>
>> On Fri, Jun 1, 2012 at 4:50 AM, Stephen Canon <scanon at apple.com> wrote:
>>
>>> On May 31, 2012, at 10:40 PM, John McCall <rjmccall at apple.com> wrote:
>>>
>>> > On May 31, 2012, at 7:22 PM, Lang Hames wrote:
>>> >> Thanks for the suggestion Matthieu. I spoke to Doug and he
>>> recommended using attributes rather than a FunctionDecl bit to represent
>>> the fp_contract state.
>>> >
>>> > Hmm.  I had suggested a bit on FunctionDecl on the assumption that
>>> this would often be controlled globally, maybe by using a flag to control
>>> the default or by activating a #pragma before including all the headers.
>>>  Actually, I could even imagine a target (maybe a GPU target?) even
>>> opting-in to this behavior by default.  If we're going to use an Attr, we
>>> need to make sure it doesn't get added unless the current #pragma state is
>>> different from the global default;  we really don't want to be allocating
>>> an attribute for every function definition in the translation unit.
>>>
>>> We want FP_CONTRACT ON to be the default for all targets.  It's also
>>> worth noting that it's critical that we support setting the pragma to OFF,
>>> but in practice this will be exceedingly rare (almost certainly less than
>>> 1% of sources, and probably far less than that).
>>>
>>
>> Based on this comment, I'm really not keen on the current representation,
>> but maybe I've mis-understood it, so I'll ask questions first:
>>
>> The 'fmuladd' intrinsic is used to whitelist specific operations for
>> fused multiply+add handling, correct?
>>
>>
>> Correct.
>>
>> If so, and if Stephen's stance is correct (I certainly agree with it!)
>> that this should be allowed for the vast majority of code, that means that
>> almost every fmul and fadd in the current IR should be a candidate for
>> fusing?
>>
>>
>> Only those that originate from a common source-language *expression*.
>>  Your examples should not be fused because the multiply and add are in two
>> separate expressions (which is why we need FE involvement; that information
>> isn't available later).
>>
>
> Ok, now I'm extra confused. Thanks for replying, hopefully you can help me
> understand better.
>
> Why would it not be OK to fuse multiplies and adds that occur in two
> source-language expressions? I have some vague memory of Fortran having
> lots of special rules about within-expression semantics versus semantics
> across expressions, but C++ has no such constraints to my knowledge, nor
> would it want them.
>
> Having these types of artificial source-representation restrictions on
> semantics in C++ undermines specific language constructs like overloaded
> operators and transparent "wrapper" classes.
>

Trying to at least do my homework, as I'm not usually working w/ numerics,
I've been reading up.

I've now read the FP_CONTRACT part of the C11 spec, and see where your
statement comes from. I find this restriction... mysterious. I would love
to understand why it is important to prevent inlining from exposing
contraction opportunities if you can give any examples.

That said, FP_CONTRACT doesn't apply to C++, and it's quite unlikely to
become a serious part of the standard given these (among other)
limitations. Curiously, in C++11, it may not be needed to get the benefit
of fused multiply-add:

[expr] p11 seems to indicate that in C++, we are almost always allowed to
use increased precision to represent operations. The only exception we can
find in the C++ standard (and thanks to Richard for helping me crawl
through this part) is this:

  static_cast<float>(static_cast<double>(x))

For any expression 'x' of floating point type, the expression may be
evaluated with extra precision, but the result of round-trip casting it
through a double must not. ;] It's not entirely clear this contortion was
intended[1]. This definition, while awkward and arbitrary, has a nice
property of being able to cleanly represent boundaries of increased
precision allowance w/o regard for inlining or other optimizations.

The state of C++11 makes my (somewhat crazy) idea of a flag a less
attractive representation, as does the C11 contraction specification, but
it still doesn't make me enthused about the default representation becoming
an intrinsic, and forcing the FE to pre-fuse all of these rather than
marking the range of fuse-able operations and allowing the middle end to
perform the fusion. I'm actually beginning to like the start/stop intrinsic
pair to represent the sequences of ineligible operations.

-Chandler

[1] There is a footnote in the latest working draft that indicates
'static_cast<float>(x)' may have been intended to be enough to force the
precision, but the current wording isn't strict enough for that to be the
case.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120605/1adf8a6e/attachment.html>