[llvm-commits] [llvm] r157737 - in /llvm/trunk: lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp lib/Target/X86/X86CodeEmitter.cpp lib/Target/X86/X86InstrFMA.td lib/Target/X86/X86InstrInfo.cpp lib/Target/X86/X86InstrInfo.h lib/Target/X86/X86Subta

Fri Jun 1 14:24:12 PDT 2012

On Fri, 01 Jun 2012 16:56:45 -0400
Stephen Canon <scanon at apple.com> wrote:

> On Jun 1, 2012, at 4:07 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> > I was under the impression from Lang's messages that these are
> > somewhat decoupled issues. The motivation for Lang's design is that
> > C only allows contractions of operations that are within the same
> > C-language statement. As only the frontend is aware of C statement
> > boundaries, it will be the responsibility of the frontend to form
> > the contractions where it can.
> 
> Right.  The point of llvm.fmuladd is to correctly model FP_CONTRACT
> on/off.
> 
> > On the other hand, some kind of "fast math" mode should allow FMA
> > formation from operations that come from different statements; in
> > other words, in general. In the PowerPC backend, for example, we
> > have FMA patterns that are guarded by
> > TM.Options.NoExcessFPPrecision. I am not responsible for this, but
> > I think this is the right thing to do.
> 
> Yes, we definitely want to have a third mode of operation in which
> FMA formation is freely licensed wherever the target model says it is
> beneficial to performance.  This third mode doesn't need to have any
> source language information, so it can be totally independent of the
> frontend and should be much simpler to implement.
> 
> I agree with your assessment that ExcessFPPrecision is pretty close
> to what is needed for such a mode -- both FMA formation and excess
> precision are transformations which make some intermediate results
> more accurate, are backwards-stable, can result in asymmetries, and
> occasionally produce slightly less accurate results when those
> asymmetries conspire against the calculation.

Steve,

Do you have a recommendation on what this mode should be called (if
we're not just going to use ExcessFPPrecision directly?), and what its
semantics should be?

Thanks again,
Hal

> 
> - Steve

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory