[cfe-commits] [llvm-commits] [PATCH] Add llvm.fmuladd intrinsic.

John McCall rjmccall at apple.com
Tue Jun 5 22:50:00 PDT 2012

On Jun 5, 2012, at 9:24 PM, Hal Finkel wrote:
> On Tue, 05 Jun 2012 20:12:00 -0700 John McCall <rjmccall at apple.com> wrote:
>> On Jun 5, 2012, at 3:35 PM, John McCall wrote:
>>> On Jun 5, 2012, at 3:04 PM, Chandler Carruth wrote:
>>>> On Tue, Jun 5, 2012 at 2:58 PM, Stephen Canon <scanon at apple.com>
>>>> wrote: On Jun 5, 2012, at 2:45 PM, John McCall
>>>> <rjmccall at apple.com> wrote:
>>>>> On Jun 5, 2012, at 2:15 PM, Stephen Canon wrote:
>>>>>> On Jun 5, 2012, at 1:51 PM, Chandler Carruth
>>>>>> <chandlerc at google.com> wrote:
>>>>>>> That said, FP_CONTRACT doesn't apply to C++, and it's quite
>>>>>>> unlikely to become a serious part of the standard given these
>>>>>>> (among other) limitations. Curiously, in C++11, it may not be
>>>>>>> needed to get the benefit of fused multiply-add:
>>>>>> Perversely, a strict reading of C++11 seems (to me) to not
>>>>>> allow FMA formation in C++ at all:
>>>>>>     • The values of the floating operands and the results of
>>>>>> floating expressions may be represented in greater precision
>>>>>> and range than that required by the type; the types are not
>>>>>> changed thereby.
>>>>>> FMA formation does not increase the precision or range of the
>>>>>> result (it may or may not have smaller error, but it is not
>>>>>> more precise), so this paragraph doesn't actually license FMA
>>>>>> formation.  I can't find anywhere else in the standard that
>>>>>> could (though I am *far* less familiar with C++11 than C11, so
>>>>>> I may not be looking in the right places).
>>>>> Correct me if I'm wrong, but I thought that an FMA could be
>>>>> formalized as representing the result of the multiply with
>>>>> greater precision than the operation's type actually provides,
>>>>> and then using that as the operand of the addition.  It's
>>>>> understand that that can change the result of the addition in
>>>>> ways that aren't just "more precise".  Similarly, performing
>>>>> 'float' operations using x87 long doubles can change the result
>>>>> of the operation, but I'm pretty sure that the committees
>>>>> explicitly had hardware limitations like that in mind when they
>>>>> added this language.
>>>> That's an interesting point.  I'm inclined to agree with this
>>>> interpretation (there are some minor details about whether or not
>>>> 0*INF + NAN raises the invalid flag, but let's agree to ignore
>>>> that).
>>>> I'm not familiar enough with the language used in the C++ spec to
>>>> know whether this makes C++ numerics equivalent to STDC
>>>> FP_CONTRACT on, or equivalent to "allow greedy FMA formation".
>>>> Anyone?
>>>> If you agree w/ John's interpretation, and don't consider the flag
>>>> case you mention, AFAICT, this allows greedy FMA formation, unless
>>>> the intermediate values are round-tripped through a cast construct
>>>> such as I described.
>>> I'm still not sure why you think this restriction *only* happens
>>> when round-tripping through casts, rather than through any thing
>>> which is not an operand or result, e.g. an object.
>>> Remember that the builtin operators are privileged in C++ — they
>>> are not semantically like calls, even in the cases where they're
>>> selected by overload resolution.
>>> I agree that my interpretation implies that a type which merely
>>> wraps a double nonetheless forces stricter behavior.  I also agree
>>> that this sucks.
>> To continue this thought, the most straightforward way to represent
>> this in IR would be to (1) add a "contractable" bit to the LLVM
>> operation (possibly as metadata) and (2) provide an explicit "value
>> barrier" instruction (a unary operator preventing contraction
>> "across" it).  We would introduce the barrier in the appropriate
>> circumstances, i.e. an explicit cast, a load from a variable, or
>> whatever else we conclude requires these semantics.  It would then be
>> straightforward to produce FMAs from this, as well as just generally
>> avoiding rounding when the doing sequences of illegal FP ops.
>> -ffast-math would imply never inserting the barriers.
>> The disadvantages I see are:
>>  - there might be lots of peepholes and isel patterns that would
>> need to be taught to to look through a value barrier
>>  - the polarity of barriers is wrong, because code that lacks
>> barriers is implicitly opting in to things, so e.g. LTO could pick a
>> weak_odr function from an old tunit that lacks a barrier which a
>> fresh compile would insist on.
> I don't like the barrier approach because it implies that the FE must
> serialize each C expression as a distinct group of LLVM instructions.
> While it may be true that this currently happens in practice, I don't
> think we want to force it to be this way.

I think you misunderstand.  By a "barrier", I mean an instruction like this:
  %1 = call float @llvm.fp_contract_barrier.float(float %0) readnone nounwind
which states that %1 must be a representable float value and therefore
blocks FP contraction "across" the intrinsic, in the sense that something
using %1 can't be fused with the operation producing %0.  I do not mean
something like a memory barrier that divides things based on whether
the instruction comes before or after the barrier;  that's clearly not workable.


More information about the cfe-commits mailing list