[PATCH] The LoopVectorizer and libm sqrt

Thu Sep 12 13:24:32 PDT 2013

----- Original Message -----
> On 09/12/2013 10:16 PM, Hal Finkel wrote:
> > ----- Original Message -----
> >> ----- Original Message -----
> >>>
> >>> On Sep 12, 2013, at 2:52 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> >>>
> >>>> ----- Original Message -----
> >>>>> On 09/12/2013 09:34 PM, Hal Finkel wrote:
> >>>>>> ----- Original Message -----
> >>>>>>> ----- Original Message -----
> >>>>>>>> I did not write this code but I assume this was done on
> >>>>>>>> purpose
> >>>>>>>> because our llvm.sqrt intrinsics has a slightly different
> >>>>>>>> semantics:
> >>>>>>>>
> >>>>>>>> The ‘llvm.sqrt‘ intrinsics return the sqrt of the specified
> >>>>>>>> operand,
> >>>>>>>> returning the same value as the libm ‘sqrt‘ functions would.
> >>>>>>>> Unlike
> >>>>>>>> sqrt in libm, however, llvm.sqrt has undefined behavior for
> >>>>>>>> negative
> >>>>>>>> numbers other than -0.0 (which allows for better
> >>>>>>>> optimization,
> >>>>>>>> because there is no need to worry about errno being set).
> >>>>>>>> llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sort.
> >>>>>>>
> >>>>>>> Hrmm... okay; I'll send a revised patch where we explicitly
> >>>>>>> check
> >>>>>>> for
> >>>>>>> fast-math mode.
> >>>>>>
> >>>>>> Or perhaps not; the TargetOptions are not available at the
> >>>>>> IR-level
> >>>>>> right now, and so this seems to leave us with two options:
> >>>>>>
> >>>>>>   1. Feed something through TTI
> >>>>>>
> >>>>>>   2. Have Clang generate the intrinsic directly in fast-math
> >>>>>>   mode
> >>>>>>
> >>>>>> I'm leaning toward (1), because I'd like to give the target
> >>>>>> the
> >>>>>> ability to declare the availability of a vectorized sqrt that
> >>>>>> is
> >>>>>> suitable as a libm sqrt replacement.
> >>>>>>
> >>>>>> What do you think?
> >>>>>
> >>>>> It seems only 2) would allow to mix fast-math and non-fast-math
> >>>>> modes,
> >>>>> which in fact my be very helpful in case of LTO and/or math
> >>>>> libraries
> >>>>> that provide fast as well as precise versions of a function.
> >>>>
> >>>> I don't think that's right. For one thing, the 'fast math' flags
> >>>> are now stored in function-level attributes (specifically for
> >>>> this
> >>>> reason).
> >>>
> >>> Except that the inliner ignores them afaik :).
> >>>
> >>> I believe that floating point function level attributes are in a
> >>> somewhat broken state, they work because I think LTO will remove
> >>> them if they mismatch (at least I hope so :)
> >>
> >> Okay :) -- Maybe the frontend approach is better then (at least
> >> for
> >> now).
> >
> > It seems as though there is some history here:
> >
> > svn blame lib/CodeGen/CGBuiltin.cpp
> >   64689    ddunbar     // Library functions with special handling.
> >   64689    ddunbar   case Builtin::BIsqrt:
> >   64689    ddunbar   case Builtin::BIsqrtf:
> >   64689    ddunbar   case Builtin::BIsqrtl: {
> > 100613   rjmccall     // TODO: there is currently no set of
> > optimizer flags
> > 100613   rjmccall     // sufficient for us to rewrite sqrt to
> > @llvm.sqrt.
> > 100613   rjmccall     // -fmath-errno=0 is not good enough; we need
> > finiteness.
> > 100613   rjmccall     // We could probably precondition the call
> > with an ult
> > 100613   rjmccall     // against 0, but is that worth the
> > complexity?
> > 100613   rjmccall     break;
> >
> > Any reason why we can't use the current fast-math setting here?
> 
> I am not sure the fast-math flag in LangOpt is the right way to go
> (see
> the attached patch that I was planning to submit). However
> CGM.getCodeGenOpts().UnsafeFPMath should be the flag that allow this.
> I
> am surprised it was not considered. Maybe I misunderstood its
> meaning.

I think I see: fast-math was not implemented in Clang until r147434. Well after John took out the sqrt-intrinsic mapping. I'll submit a patch that adds it back (conditioned on UnsafeFPMath).

 -Hal

> 
> Tobias
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory