[PATCH] The LoopVectorizer and libm sqrt

Thu Sep 12 12:52:04 PDT 2013

----- Original Message -----
> On 09/12/2013 09:34 PM, Hal Finkel wrote:
> > ----- Original Message -----
> >> ----- Original Message -----
> >>> I did not write this code but I assume this was done on purpose
> >>> because our llvm.sqrt intrinsics has a slightly different
> >>> semantics:
> >>>
> >>> The ‘llvm.sqrt‘ intrinsics return the sqrt of the specified
> >>> operand,
> >>> returning the same value as the libm ‘sqrt‘ functions would.
> >>> Unlike
> >>> sqrt in libm, however, llvm.sqrt has undefined behavior for
> >>> negative
> >>> numbers other than -0.0 (which allows for better optimization,
> >>> because there is no need to worry about errno being set).
> >>> llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sort.
> >>
> >> Hrmm... okay; I'll send a revised patch where we explicitly check
> >> for
> >> fast-math mode.
> >
> > Or perhaps not; the TargetOptions are not available at the IR-level
> > right now, and so this seems to leave us with two options:
> >
> >   1. Feed something through TTI
> >
> >   2. Have Clang generate the intrinsic directly in fast-math mode
> >
> > I'm leaning toward (1), because I'd like to give the target the
> > ability to declare the availability of a vectorized sqrt that is
> > suitable as a libm sqrt replacement.
> >
> > What do you think?
> 
> It seems only 2) would allow to mix fast-math and non-fast-math
> modes,
> which in fact my be very helpful in case of LTO and/or math libraries
> that provide fast as well as precise versions of a function.

I don't think that's right. For one thing, the 'fast math' flags are now stored in function-level attributes (specifically for this reason).

 -Hal

> 
> Cheers,
> Tobias
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory