[PATCH] The LoopVectorizer and libm sqrt

Thu Sep 12 13:16:31 PDT 2013

----- Original Message -----
> ----- Original Message -----
> > 
> > On Sep 12, 2013, at 2:52 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> > 
> > > ----- Original Message -----
> > >> On 09/12/2013 09:34 PM, Hal Finkel wrote:
> > >>> ----- Original Message -----
> > >>>> ----- Original Message -----
> > >>>>> I did not write this code but I assume this was done on
> > >>>>> purpose
> > >>>>> because our llvm.sqrt intrinsics has a slightly different
> > >>>>> semantics:
> > >>>>> 
> > >>>>> The ‘llvm.sqrt‘ intrinsics return the sqrt of the specified
> > >>>>> operand,
> > >>>>> returning the same value as the libm ‘sqrt‘ functions would.
> > >>>>> Unlike
> > >>>>> sqrt in libm, however, llvm.sqrt has undefined behavior for
> > >>>>> negative
> > >>>>> numbers other than -0.0 (which allows for better
> > >>>>> optimization,
> > >>>>> because there is no need to worry about errno being set).
> > >>>>> llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sort.
> > >>>> 
> > >>>> Hrmm... okay; I'll send a revised patch where we explicitly
> > >>>> check
> > >>>> for
> > >>>> fast-math mode.
> > >>> 
> > >>> Or perhaps not; the TargetOptions are not available at the
> > >>> IR-level
> > >>> right now, and so this seems to leave us with two options:
> > >>> 
> > >>>  1. Feed something through TTI
> > >>> 
> > >>>  2. Have Clang generate the intrinsic directly in fast-math
> > >>>  mode
> > >>> 
> > >>> I'm leaning toward (1), because I'd like to give the target the
> > >>> ability to declare the availability of a vectorized sqrt that
> > >>> is
> > >>> suitable as a libm sqrt replacement.
> > >>> 
> > >>> What do you think?
> > >> 
> > >> It seems only 2) would allow to mix fast-math and non-fast-math
> > >> modes,
> > >> which in fact my be very helpful in case of LTO and/or math
> > >> libraries
> > >> that provide fast as well as precise versions of a function.
> > > 
> > > I don't think that's right. For one thing, the 'fast math' flags
> > > are now stored in function-level attributes (specifically for
> > > this
> > > reason).
> > 
> > Except that the inliner ignores them afaik :).
> > 
> > I believe that floating point function level attributes are in a
> > somewhat broken state, they work because I think LTO will remove
> > them if they mismatch (at least I hope so :)
> 
> Okay :) -- Maybe the frontend approach is better then (at least for
> now).

It seems as though there is some history here:

svn blame lib/CodeGen/CGBuiltin.cpp
 64689    ddunbar     // Library functions with special handling.
 64689    ddunbar   case Builtin::BIsqrt:
 64689    ddunbar   case Builtin::BIsqrtf:
 64689    ddunbar   case Builtin::BIsqrtl: {
100613   rjmccall     // TODO: there is currently no set of optimizer flags
100613   rjmccall     // sufficient for us to rewrite sqrt to @llvm.sqrt.
100613   rjmccall     // -fmath-errno=0 is not good enough; we need finiteness.
100613   rjmccall     // We could probably precondition the call with an ult
100613   rjmccall     // against 0, but is that worth the complexity?
100613   rjmccall     break;

Any reason why we can't use the current fast-math setting here?

 -Hal

> 
>  -Hal
> 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory