[PATCH] The LoopVectorizer and libm sqrt

Thu Sep 12 12:45:00 PDT 2013

On 09/12/2013 09:34 PM, Hal Finkel wrote:
> ----- Original Message -----
>> ----- Original Message -----
>>> I did not write this code but I assume this was done on purpose
>>> because our llvm.sqrt intrinsics has a slightly different
>>> semantics:
>>>
>>> The ‘llvm.sqrt‘ intrinsics return the sqrt of the specified
>>> operand,
>>> returning the same value as the libm ‘sqrt‘ functions would. Unlike
>>> sqrt in libm, however, llvm.sqrt has undefined behavior for
>>> negative
>>> numbers other than -0.0 (which allows for better optimization,
>>> because there is no need to worry about errno being set).
>>> llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sort.
>>
>> Hrmm... okay; I'll send a revised patch where we explicitly check for
>> fast-math mode.
>
> Or perhaps not; the TargetOptions are not available at the IR-level right now, and so this seems to leave us with two options:
>
>   1. Feed something through TTI
>
>   2. Have Clang generate the intrinsic directly in fast-math mode
>
> I'm leaning toward (1), because I'd like to give the target the ability to declare the availability of a vectorized sqrt that is suitable as a libm sqrt replacement.
>
> What do you think?

It seems only 2) would allow to mix fast-math and non-fast-math modes,
which in fact my be very helpful in case of LTO and/or math libraries 
that provide fast as well as precise versions of a function.

Cheers,
Tobias