[LLVMdev] [RFC] How to fix sqrt vs llvm.sqrt optimization asymmetry
Nicholas Chapman
admin at indigorenderer.com
Mon Nov 11 13:45:18 PST 2013
Hi Hal, all.
I'm not sure why llvm.sqrt is 'special'. Maybe because there is a SSE
packed sqrt instruction (SQRTPS) but not e.g. a packed sin instruction
AFAIK.
As mentioned in a recent mail to this list, I would like llvm.sqrt to be
defined as NaN for argument x < 0. I believe this would bring it more
into line with the other intrinsics, and with the libm result, which is
NaN for x < 0:
http://pubs.opengroup.org/onlinepubs/007904975/functions/sqrt.html
Cheers,
Nick
On 10/11/2013 3:36 p.m., Hal Finkel wrote:
> Hello everyone,
>
> The particular motivation for this e-mail is my desire for feedback on how to fix PR17758; but there is a core design issue here, so I'd like a wide audience.
>
> The underlying issue is that, because the semantics of llvm.sqrt are purposefully defined to be different from libm sqrt (unlike all of the other llvm.<libm function> intrinsics) (*), and because autovectorization relies on the vector forms of these intrinsics when vectorizing function calls to libm math functions, we cannot vectorize a libm sqrt() call into a vector llvm.sqrt call. However, in fast-math mode, we'd like to vectorize calls to sqrt, and so I modified Clang to emit calls to llvm.sqrt in fast-math mode for sqrt (and sqrt[fl]). This makes it similar to the libm pow and fma calls, which Clang always transforms into the llvm.pow and llvm.fma intrinsics.
>
> Here's the problem: There is an InstCombine optimization for sqrt (inside visitFPTrunc), and a bunch of optimizations inside SimplifyLibCalls that apply only to the sqrt libm call, and not to the intrinsics. The result, among other things, is PR17758, where fast-math mode actually produces slower code for non-vectorized sqrt calls.
>
> Some questions:
>
> - Is the asymmetry between optimizations performed on libm calls and their corresponding llvm.<libm function> intrinsics intentional, or just due to a lack of motivation?
>
> - Even if unintentional, is this asymmetry in any way desirable (for sqrt in particular, or in general)?
>
> - I can refactor all existing optimizations to be libm-call vs. intrinsics agnostic, but is that the desired solution? If so, any advice on a particularly-nice way to do this would certainly be appreciated.
>
> For example, an alternative solution for PR17758 in particular would be to revert the Clang change, introduce a new intrinsic for sqrt that does match the libm semantics, and have vectorization use that when available.
>
> Another alternative is to revert the Clang change and make autovectorization of libm sqrt -> llvm.sqrt dependent on the NoNaNsFPMath TargetOptions flag (this requires directly exposing parts of TargetOptions to the IR level).
>
> I believe that both of these alternatives also require fixing the inliner to deal properly with fast-math attributes during LTO (unless I can just ignore this for now). This was the objection raised to the TargetOptions solution when I first brought it up.
>
> (*) According to the language reference, the specific difference is, "Unlike sqrt in libm, however, llvm.sqrt has undefined behavior for negative numbers other than -0.0 (which allows for better optimization, because there is no need to worry about errno being set). llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sqrt."
>
> Thanks again,
> Hal
>
More information about the llvm-dev
mailing list