[PATCH] The LoopVectorizer and libm sqrt

Thu Sep 12 13:20:09 PDT 2013

On 09/12/2013 10:16 PM, Hal Finkel wrote:
> ----- Original Message -----
>> ----- Original Message -----
>>>
>>> On Sep 12, 2013, at 2:52 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>
>>>> ----- Original Message -----
>>>>> On 09/12/2013 09:34 PM, Hal Finkel wrote:
>>>>>> ----- Original Message -----
>>>>>>> ----- Original Message -----
>>>>>>>> I did not write this code but I assume this was done on
>>>>>>>> purpose
>>>>>>>> because our llvm.sqrt intrinsics has a slightly different
>>>>>>>> semantics:
>>>>>>>>
>>>>>>>> The ‘llvm.sqrt‘ intrinsics return the sqrt of the specified
>>>>>>>> operand,
>>>>>>>> returning the same value as the libm ‘sqrt‘ functions would.
>>>>>>>> Unlike
>>>>>>>> sqrt in libm, however, llvm.sqrt has undefined behavior for
>>>>>>>> negative
>>>>>>>> numbers other than -0.0 (which allows for better
>>>>>>>> optimization,
>>>>>>>> because there is no need to worry about errno being set).
>>>>>>>> llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sort.
>>>>>>>
>>>>>>> Hrmm... okay; I'll send a revised patch where we explicitly
>>>>>>> check
>>>>>>> for
>>>>>>> fast-math mode.
>>>>>>
>>>>>> Or perhaps not; the TargetOptions are not available at the
>>>>>> IR-level
>>>>>> right now, and so this seems to leave us with two options:
>>>>>>
>>>>>>   1. Feed something through TTI
>>>>>>
>>>>>>   2. Have Clang generate the intrinsic directly in fast-math
>>>>>>   mode
>>>>>>
>>>>>> I'm leaning toward (1), because I'd like to give the target the
>>>>>> ability to declare the availability of a vectorized sqrt that
>>>>>> is
>>>>>> suitable as a libm sqrt replacement.
>>>>>>
>>>>>> What do you think?
>>>>>
>>>>> It seems only 2) would allow to mix fast-math and non-fast-math
>>>>> modes,
>>>>> which in fact my be very helpful in case of LTO and/or math
>>>>> libraries
>>>>> that provide fast as well as precise versions of a function.
>>>>
>>>> I don't think that's right. For one thing, the 'fast math' flags
>>>> are now stored in function-level attributes (specifically for
>>>> this
>>>> reason).
>>>
>>> Except that the inliner ignores them afaik :).
>>>
>>> I believe that floating point function level attributes are in a
>>> somewhat broken state, they work because I think LTO will remove
>>> them if they mismatch (at least I hope so :)
>>
>> Okay :) -- Maybe the frontend approach is better then (at least for
>> now).
>
> It seems as though there is some history here:
>
> svn blame lib/CodeGen/CGBuiltin.cpp
>   64689    ddunbar     // Library functions with special handling.
>   64689    ddunbar   case Builtin::BIsqrt:
>   64689    ddunbar   case Builtin::BIsqrtf:
>   64689    ddunbar   case Builtin::BIsqrtl: {
> 100613   rjmccall     // TODO: there is currently no set of optimizer flags
> 100613   rjmccall     // sufficient for us to rewrite sqrt to @llvm.sqrt.
> 100613   rjmccall     // -fmath-errno=0 is not good enough; we need finiteness.
> 100613   rjmccall     // We could probably precondition the call with an ult
> 100613   rjmccall     // against 0, but is that worth the complexity?
> 100613   rjmccall     break;
>
> Any reason why we can't use the current fast-math setting here?

I am not sure the fast-math flag in LangOpt is the right way to go (see 
the attached patch that I was planning to submit). However 
CGM.getCodeGenOpts().UnsafeFPMath should be the flag that allow this. I 
am surprised it was not considered. Maybe I misunderstood its meaning.

Tobias

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Use-CodeGenOptions-to-derive-LLVM-IR-fast-math-flags.patch
Type: text/x-patch
Size: 4134 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130912/dc35a9d2/attachment.bin>