[PATCH] The LoopVectorizer and libm sqrt
Tobias Grosser
tobias at grosser.es
Thu Sep 12 13:36:57 PDT 2013
On 09/12/2013 10:24 PM, Hal Finkel wrote:
> ----- Original Message -----
>> On 09/12/2013 10:16 PM, Hal Finkel wrote:
>>> ----- Original Message -----
>>>> ----- Original Message -----
>>>>>
>>>>> On Sep 12, 2013, at 2:52 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>>>
>>>>>> ----- Original Message -----
>>>>>>> On 09/12/2013 09:34 PM, Hal Finkel wrote:
>>>>>>>> ----- Original Message -----
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> I did not write this code but I assume this was done on
>>>>>>>>>> purpose
>>>>>>>>>> because our llvm.sqrt intrinsics has a slightly different
>>>>>>>>>> semantics:
>>>>>>>>>>
>>>>>>>>>> The ‘llvm.sqrt‘ intrinsics return the sqrt of the specified
>>>>>>>>>> operand,
>>>>>>>>>> returning the same value as the libm ‘sqrt‘ functions would.
>>>>>>>>>> Unlike
>>>>>>>>>> sqrt in libm, however, llvm.sqrt has undefined behavior for
>>>>>>>>>> negative
>>>>>>>>>> numbers other than -0.0 (which allows for better
>>>>>>>>>> optimization,
>>>>>>>>>> because there is no need to worry about errno being set).
>>>>>>>>>> llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sort.
>>>>>>>>>
>>>>>>>>> Hrmm... okay; I'll send a revised patch where we explicitly
>>>>>>>>> check
>>>>>>>>> for
>>>>>>>>> fast-math mode.
>>>>>>>>
>>>>>>>> Or perhaps not; the TargetOptions are not available at the
>>>>>>>> IR-level
>>>>>>>> right now, and so this seems to leave us with two options:
>>>>>>>>
>>>>>>>> 1. Feed something through TTI
>>>>>>>>
>>>>>>>> 2. Have Clang generate the intrinsic directly in fast-math
>>>>>>>> mode
>>>>>>>>
>>>>>>>> I'm leaning toward (1), because I'd like to give the target
>>>>>>>> the
>>>>>>>> ability to declare the availability of a vectorized sqrt that
>>>>>>>> is
>>>>>>>> suitable as a libm sqrt replacement.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>
>>>>>>> It seems only 2) would allow to mix fast-math and non-fast-math
>>>>>>> modes,
>>>>>>> which in fact my be very helpful in case of LTO and/or math
>>>>>>> libraries
>>>>>>> that provide fast as well as precise versions of a function.
>>>>>>
>>>>>> I don't think that's right. For one thing, the 'fast math' flags
>>>>>> are now stored in function-level attributes (specifically for
>>>>>> this
>>>>>> reason).
>>>>>
>>>>> Except that the inliner ignores them afaik :).
>>>>>
>>>>> I believe that floating point function level attributes are in a
>>>>> somewhat broken state, they work because I think LTO will remove
>>>>> them if they mismatch (at least I hope so :)
>>>>
>>>> Okay :) -- Maybe the frontend approach is better then (at least
>>>> for
>>>> now).
>>>
>>> It seems as though there is some history here:
>>>
>>> svn blame lib/CodeGen/CGBuiltin.cpp
>>> 64689 ddunbar // Library functions with special handling.
>>> 64689 ddunbar case Builtin::BIsqrt:
>>> 64689 ddunbar case Builtin::BIsqrtf:
>>> 64689 ddunbar case Builtin::BIsqrtl: {
>>> 100613 rjmccall // TODO: there is currently no set of
>>> optimizer flags
>>> 100613 rjmccall // sufficient for us to rewrite sqrt to
>>> @llvm.sqrt.
>>> 100613 rjmccall // -fmath-errno=0 is not good enough; we need
>>> finiteness.
>>> 100613 rjmccall // We could probably precondition the call
>>> with an ult
>>> 100613 rjmccall // against 0, but is that worth the
>>> complexity?
>>> 100613 rjmccall break;
>>>
>>> Any reason why we can't use the current fast-math setting here?
>>
>> I am not sure the fast-math flag in LangOpt is the right way to go
>> (see
>> the attached patch that I was planning to submit). However
>> CGM.getCodeGenOpts().UnsafeFPMath should be the flag that allow this.
>> I
>> am surprised it was not considered. Maybe I misunderstood its
>> meaning.
>
> I think I see: fast-math was not implemented in Clang until r147434. Well after John took out the sqrt-intrinsic mapping. I'll submit a patch that adds it back (conditioned on UnsafeFPMath).
From my point of view that seems to be the right way to go.
Tobias
More information about the llvm-commits
mailing list