[PATCH] The LoopVectorizer and libm sqrt

Thu Sep 12 13:03:41 PDT 2013

----- Original Message -----
> On 09/12/2013 09:52 PM, Hal Finkel wrote:
> > ----- Original Message -----
> >> On 09/12/2013 09:34 PM, Hal Finkel wrote:
> >>> ----- Original Message -----
> >>>> ----- Original Message -----
> >>>>> I did not write this code but I assume this was done on purpose
> >>>>> because our llvm.sqrt intrinsics has a slightly different
> >>>>> semantics:
> >>>>>
> >>>>> The ‘llvm.sqrt‘ intrinsics return the sqrt of the specified
> >>>>> operand,
> >>>>> returning the same value as the libm ‘sqrt‘ functions would.
> >>>>> Unlike
> >>>>> sqrt in libm, however, llvm.sqrt has undefined behavior for
> >>>>> negative
> >>>>> numbers other than -0.0 (which allows for better optimization,
> >>>>> because there is no need to worry about errno being set).
> >>>>> llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sort.
> >>>>
> >>>> Hrmm... okay; I'll send a revised patch where we explicitly
> >>>> check
> >>>> for
> >>>> fast-math mode.
> >>>
> >>> Or perhaps not; the TargetOptions are not available at the
> >>> IR-level
> >>> right now, and so this seems to leave us with two options:
> >>>
> >>>    1. Feed something through TTI
> >>>
> >>>    2. Have Clang generate the intrinsic directly in fast-math
> >>>    mode
> >>>
> >>> I'm leaning toward (1), because I'd like to give the target the
> >>> ability to declare the availability of a vectorized sqrt that is
> >>> suitable as a libm sqrt replacement.
> >>>
> >>> What do you think?
> >>
> >> It seems only 2) would allow to mix fast-math and non-fast-math
> >> modes,
> >> which in fact my be very helpful in case of LTO and/or math
> >> libraries
> >> that provide fast as well as precise versions of a function.
> >
> > I don't think that's right. For one thing, the 'fast math' flags
> > are now stored in function-level attributes (specifically for this
> > reason).
> 
> Are you saying the function-level attributes will influence the TTI?
> Will it be changed on a per-function basis?

Yes, I believe that is how the system is supposed to work.

> 
> Also, function level fast math flags seem to cause similar problems
> as
> soon as we start inlining, no?

That's a good point. I thought that the inliner prohibited inlining a fast-math function into a non-fast-math function, but looking at the code, I don't see any such prohibition.

 -Hal

> 
> Tobi
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory