[cfe-commits] [llvm-commits] [PATCH] Add llvm.fmuladd intrinsic.

Wed Jun 6 05:21:47 PDT 2012

On Tue, 05 Jun 2012 22:50:00 -0700
John McCall <rjmccall at apple.com> wrote:

> On Jun 5, 2012, at 9:24 PM, Hal Finkel wrote:
> > On Tue, 05 Jun 2012 20:12:00 -0700 John McCall <rjmccall at apple.com>
> > wrote:
> >> On Jun 5, 2012, at 3:35 PM, John McCall wrote:
> >>> On Jun 5, 2012, at 3:04 PM, Chandler Carruth wrote:
> >>>> On Tue, Jun 5, 2012 at 2:58 PM, Stephen Canon <scanon at apple.com>
> >>>> wrote: On Jun 5, 2012, at 2:45 PM, John McCall
> >>>> <rjmccall at apple.com> wrote:
> >>>> 
> >>>>> On Jun 5, 2012, at 2:15 PM, Stephen Canon wrote:
> >>>>> 
> >>>>>> On Jun 5, 2012, at 1:51 PM, Chandler Carruth
> >>>>>> <chandlerc at google.com> wrote:
> >>>>>> 
> >>>>>>> That said, FP_CONTRACT doesn't apply to C++, and it's quite
> >>>>>>> unlikely to become a serious part of the standard given these
> >>>>>>> (among other) limitations. Curiously, in C++11, it may not be
> >>>>>>> needed to get the benefit of fused multiply-add:
> >>>>>> 
> >>>>>> Perversely, a strict reading of C++11 seems (to me) to not
> >>>>>> allow FMA formation in C++ at all:
> >>>>>> 
> >>>>>>     • The values of the floating operands and the results of
> >>>>>> floating expressions may be represented in greater precision
> >>>>>> and range than that required by the type; the types are not
> >>>>>> changed thereby.
> >>>>>> 
> >>>>>> FMA formation does not increase the precision or range of the
> >>>>>> result (it may or may not have smaller error, but it is not
> >>>>>> more precise), so this paragraph doesn't actually license FMA
> >>>>>> formation.  I can't find anywhere else in the standard that
> >>>>>> could (though I am *far* less familiar with C++11 than C11, so
> >>>>>> I may not be looking in the right places).
> >>>>> 
> >>>>> Correct me if I'm wrong, but I thought that an FMA could be
> >>>>> formalized as representing the result of the multiply with
> >>>>> greater precision than the operation's type actually provides,
> >>>>> and then using that as the operand of the addition.  It's
> >>>>> understand that that can change the result of the addition in
> >>>>> ways that aren't just "more precise".  Similarly, performing
> >>>>> 'float' operations using x87 long doubles can change the result
> >>>>> of the operation, but I'm pretty sure that the committees
> >>>>> explicitly had hardware limitations like that in mind when they
> >>>>> added this language.
> >>>> 
> >>>> That's an interesting point.  I'm inclined to agree with this
> >>>> interpretation (there are some minor details about whether or not
> >>>> 0*INF + NAN raises the invalid flag, but let's agree to ignore
> >>>> that).
> >>>> 
> >>>> I'm not familiar enough with the language used in the C++ spec to
> >>>> know whether this makes C++ numerics equivalent to STDC
> >>>> FP_CONTRACT on, or equivalent to "allow greedy FMA formation".
> >>>> Anyone?
> >>>> 
> >>>> If you agree w/ John's interpretation, and don't consider the
> >>>> flag case you mention, AFAICT, this allows greedy FMA formation,
> >>>> unless the intermediate values are round-tripped through a cast
> >>>> construct such as I described.
> >>> 
> >>> I'm still not sure why you think this restriction *only* happens
> >>> when round-tripping through casts, rather than through any thing
> >>> which is not an operand or result, e.g. an object.
> >>> 
> >>> Remember that the builtin operators are privileged in C++ — they
> >>> are not semantically like calls, even in the cases where they're
> >>> selected by overload resolution.
> >>> 
> >>> I agree that my interpretation implies that a type which merely
> >>> wraps a double nonetheless forces stricter behavior.  I also agree
> >>> that this sucks.
> >> 
> >> To continue this thought, the most straightforward way to represent
> >> this in IR would be to (1) add a "contractable" bit to the LLVM
> >> operation (possibly as metadata) and (2) provide an explicit "value
> >> barrier" instruction (a unary operator preventing contraction
> >> "across" it).  We would introduce the barrier in the appropriate
> >> circumstances, i.e. an explicit cast, a load from a variable, or
> >> whatever else we conclude requires these semantics.  It would then
> >> be straightforward to produce FMAs from this, as well as just
> >> generally avoiding rounding when the doing sequences of illegal FP
> >> ops. -ffast-math would imply never inserting the barriers.
> >> 
> >> The disadvantages I see are:
> >>  - there might be lots of peepholes and isel patterns that would
> >> need to be taught to to look through a value barrier
> >>  - the polarity of barriers is wrong, because code that lacks
> >> barriers is implicitly opting in to things, so e.g. LTO could pick
> >> a weak_odr function from an old tunit that lacks a barrier which a
> >> fresh compile would insist on.
> > 
> > I don't like the barrier approach because it implies that the FE
> > must serialize each C expression as a distinct group of LLVM
> > instructions. While it may be true that this currently happens in
> > practice, I don't think we want to force it to be this way.
> 
> I think you misunderstand.  

Indeed I did misunderstand. Thank you for clarifying, and I agree, your
proposal makes sense.

 -Hal

> By a "barrier", I mean an instruction
> like this: %1 = call float @llvm.fp_contract_barrier.float(float %0)
> readnone nounwind which states that %1 must be a representable float
> value and therefore blocks FP contraction "across" the intrinsic, in
> the sense that something using %1 can't be fused with the operation
> producing %0.  I do not mean something like a memory barrier that
> divides things based on whether the instruction comes before or after
> the barrier;  that's clearly not workable.
> 
> John.

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory