[LLVMdev] [RFC] How to fix sqrt vs llvm.sqrt optimization asymmetry

Mon Nov 11 21:30:53 PST 2013

----- Original Message -----
> Hi Hal, all.
> 
> I'm not sure why llvm.sqrt is 'special'.  Maybe because there is a
> SSE
> packed sqrt instruction (SQRTPS) but not e.g. a packed sin
> instruction
> AFAIK.

This seems relevant: http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-August/010248.html

Chris, et al., does the decision on how to treat sqrt predate our current way of handling errno?

 -Hal

> 
> As mentioned in a recent mail to this list, I would like llvm.sqrt to
> be
> defined as NaN for argument x < 0.  I believe this would bring it
> more
> into line with the other intrinsics, and with the libm result, which
> is
> NaN for x < 0:
> http://pubs.opengroup.org/onlinepubs/007904975/functions/sqrt.html
> 
> Cheers,
>      Nick
> 
> On 10/11/2013 3:36 p.m., Hal Finkel wrote:
> > Hello everyone,
> >
> > The particular motivation for this e-mail is my desire for feedback
> > on how to fix PR17758; but there is a core design issue here, so
> > I'd like a wide audience.
> >
> > The underlying issue is that, because the semantics of llvm.sqrt
> > are purposefully defined to be different from libm sqrt (unlike
> > all of the other llvm.<libm function> intrinsics) (*), and because
> > autovectorization relies on the vector forms of these intrinsics
> > when vectorizing function calls to libm math functions, we cannot
> > vectorize a libm sqrt() call into a vector llvm.sqrt call.
> > However, in fast-math mode, we'd like to vectorize calls to sqrt,
> > and so I modified Clang to emit calls to llvm.sqrt in fast-math
> > mode for sqrt (and sqrt[fl]). This makes it similar to the libm
> > pow and fma calls, which Clang always transforms into the llvm.pow
> > and llvm.fma intrinsics.
> >
> > Here's the problem: There is an InstCombine optimization for sqrt
> > (inside visitFPTrunc), and a bunch of optimizations inside
> > SimplifyLibCalls that apply only to the sqrt libm call, and not to
> > the intrinsics. The result, among other things, is PR17758, where
> > fast-math mode actually produces slower code for non-vectorized
> > sqrt calls.
> >
> > Some questions:
> >
> >   - Is the asymmetry between optimizations performed on libm calls
> >   and their corresponding llvm.<libm function> intrinsics
> >   intentional, or just due to a lack of motivation?
> >
> >   - Even if unintentional, is this asymmetry in any way desirable
> >   (for sqrt in particular, or in general)?
> >
> >   - I can refactor all existing optimizations to be libm-call vs.
> >   intrinsics agnostic, but is that the desired solution? If so,
> >   any advice on a particularly-nice way to do this would certainly
> >   be appreciated.
> >
> > For example, an alternative solution for PR17758 in particular
> > would be to revert the Clang change, introduce a new intrinsic for
> > sqrt that does match the libm semantics, and have vectorization
> > use that when available.
> >
> > Another alternative is to revert the Clang change and make
> > autovectorization of libm sqrt -> llvm.sqrt dependent on the
> > NoNaNsFPMath TargetOptions flag (this requires directly exposing
> > parts of TargetOptions to the IR level).
> >
> > I believe that both of these alternatives also require fixing the
> > inliner to deal properly with fast-math attributes during LTO
> > (unless I can just ignore this for now). This was the objection
> > raised to the TargetOptions solution when I first brought it up.
> >
> > (*) According to the language reference, the specific difference
> > is, "Unlike sqrt in libm, however, llvm.sqrt has undefined
> > behavior for negative numbers other than -0.0 (which allows for
> > better optimization, because there is no need to worry about errno
> > being set). llvm.sqrt(-0.0) is defined to return -0.0 like IEEE
> > sqrt."
> >
> > Thanks again,
> > Hal
> >
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory